Coding with Multiple AI Agents to Build Scalable Rate-Limiting Infrastructure

At Ayrshare, we provide scalable social media APIs that serve developers and AI platforms worldwide. Given the super-fast growth of some of our customers, their usage tends to scale quickly, and so do the costs.

We needed a rate-limiting system that could:

  • Scale automatically with customer growth.
  • Protect against runaway costs.
  • Support enterprise overrides and policy flexibility.
  • Eliminate manual plan adjustments.

Our dev team had been experimenting individually with Claude Code (mostly via Cursor), but we wanted to do multi-agent work to accelerate our roadmap. This feature seemed like a perfect opportunity to take our AI development to the next level by using multiple agents and subagents to work on a major new feature.

This article is the first in a series documenting what we learned. We’ll cover the beginning of the process, including using multiple AIs to craft our prompt, using a Ralph loop for testing, and creating production-ready code as a pull request in GitHub.

Step 1: Using ChatGPT as our Prompt Engineer

The initial step was to clearly define the problem and establish the ground rules for the refactor. We fed our high-level requirement: to refactor rate limits for scaling, cost protection, and enterprise flexibility, into ChatGPT.

Original Prompt

For the Ayrshare API, I want to write a prompt for Claude Code that will guide it in a task to research all rate limits and maximums in the system and help us to refactor them to allow customers to scale without manual intervention, while also allowing us to set maximum limits and caching for enterprise accounts that keep them running profitably. For example, when we use X, we pay per post or per API call, so it's vital that we have caching or system limits that scale with the number of social profiles a customer manages. We use Redis for caching, our platform is coded in Node, and the plans live in a Firestore DB.

The resulting “ChatGPT Response” provided a detailed, but generic, three-phase plan. The plan was over 1,000 words, and it was impressive to see the AI turn such a short prompt into a comprehensive plan.

  1. Build “Limits and Maximums Inventory”: a discovery phase.
  2. Define a scalable policy model: designing the policy system in Firestore.
  3. Implement policy engine, limiter, and integration: the core code refactor.

This gave us a structured foundation to focus on abstract concepts such as tenantId and generic file paths. Our team reviewed this plan collaboratively via a live video call. Unlike AI-generated content (which often requires tons of rewriting), most of the technical planning was strong from V1.

Step 2: Using Claude’s Knowledge of our Codebase

Because ChatGPT doesn’t have access to our codebase, the suggestions were correct in an academic sense, but required massive rewrites of our software. The terminology, variable names, etc., were all wrong; more changes would increase the risk of error.

So, we fed the original plan from ChatGPT into Claude Code, which has knowledge of our private GitHub repositories. Claude’s task was to refine the plan, making it specific to our codebase.


For this step, we used Claude Code in the Terminal directly, but any IDE, such as Cursor, would work just as well.

Claude transformed the plan by:

  • Identifying and documenting our existing limits.
  • Addressing our actual tech stack and specific services.
  • Defining specific paths that match our code.
  • Using variable names and conventions from our existing codebase.

These changes looked good, but before implementing anything, we wanted to know what had changed. So we used multiple tabs in a Google Doc and asked Gemini to review the changes.

Step 3: Reviewing Changes with Gemini

To validate the value of the multi-stage AI process, we cut and pasted the various prompts into a Google Doc to use Gemini. We asked Gemini to perform a diff analysis between the initial ChatGPT’s output and Claude Code.

Gemini’s analysis highlighted the changing items, which we reviewed live as a team:

FeatureTab 1: General Prompt (ChatGPT)Tab 2: Ayrshare-Specific (Claude Code)
Technology StackGeneric Redis and Firestore.Firebase, Node-cache, Redis bottleneck
Data Model ScopeAbstract tenant (tenants/{tenantId}).Specific: user (users/{uid}).
Codebase & File PathsGeneral sections controllers.Specifies exact file locations: e.g., /middleware/auth/ (rateLimiter.js), /src/networks/ (twitter.js).
Existing LimitsGeneric limits.Includes specific constants: e.g., TWITTER_USAGE_MONTHLY_MAX_BASIC=100, existing error codes.

This multi-AI workflow allowed us to leverage each model’s strengths and let them check each other’s work.

Step 4: Implementing with Claude

Using the plan provided by Claude Code, we proceeded with the refactor, focusing on a single source of truth for every customer’s limits, documented in docs/limit-policy-design.md.

Profile-Aware Scaling

A key requirement was scaling. The effective limit for a user is now computed dynamically. This model ensures a customer with a Business Plan for 50 profiles gets significantly higher limits than a Launch plan customer with 10 profiles, allowing them to grow without manual plan changes.

Implementation: The Policy Engine and Limiter

The core implementation involved creating new modules: /src/policy/ and /src/limits/.

  1. The Policy Engine: the getTenantPolicy(uid) module retrieves a user’s computed limits, applies scaling and overrides, and caches the result in Redis (key: policy:{uid}:v{version}) with single-flight protection to prevent database stampedes.
  2. The Limiter Interface: we consolidated all rate-limiting logic into a unified API (checkRate(), consumeQuota(), checkConcurrency()) backed by atomic Redis Lua scripts for safety. This system replaced the existing Firebase rate limiter.

After reviewing the diff analysis and correcting the Claude prompt, we sent the final prompt to Claude Code to write a PR.

Security side note: we limit our repo to prevent force pushes. Even though Claude is designed to ask first at every step, this prevents any bug on their end from changing code directly in our repo without our knowledge.

The new code was delivered as PR in about 10 minutes. Crafting the prompts took far more time in this process than the AI coding, but that mirrors how the best engineers often spend more time architecting a feature than coding it.

Step 5: Testing and Polishing using Ralph Loops

One of our goals for this effort was to try out Ralph Wiggum’s Ralph loops. So we had set up our environments with Ali Shaheen’s Claude Code Bootstrap. Ralph lets you run the same process over and over until you reach the final result you’re looking for.

In this case, we used our Ralph loop to make our Claude agent write tests and test the new code. Millions of tokens later, we had a tested and submitted Pull Request.

What’s Next: Subagents, Deeper Orchestration, and Scaling Further

This project changed how we think about AI-assisted development. The coding itself was fast, but the real leverage came from orchestration: choosing the right model for each stage, validating outputs across systems, and tightening prompts until the architecture was correct.

In the next article, we’re breaking down how we used subagents inside Claude Code to divide responsibilities and manage larger refactors across long-running sessions. At Ayrshare, this kind of thinking directly shapes how we build and scale our social media APIs for developers and AI platforms.

If you’re building AI-powered products or developer-facing APIs and thinking about how to scale your systems intelligently, Part 2 unpacks the orchestration layer in more detail. And if you’re looking for scalable social media APIs that are built with this level of architectural thinking behind the scenes, explore what we’ve created at Ayrshare.