Whiteboard Scale›Rate Limiter›Design Walkthrough

Rate Limiter System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 25 min

We designed a rate limiter handling 2.9M checks/sec with Sliding Window Counter (400 bytes/user, 99.7% accuracy vs Sliding Log's 6 TB impossibility). Redis counters with Lua-script atomicity eliminate race conditions. Fail-open with local fallback counters on Redis failure. API Gateway middleware placement avoids the extra network hop of a separate microservice.

1/10

What is Rate Limiter?

A rate limiter controls how many requests a client can make in a given time window. Cross the threshold: HTTP 429 Too Many Requests. Cloudflare blocks 72 billion threats per day.

Without rate limiting, a single bot can exhaust database connections, memory, and CPU. Real-world examples: 3 failed logins triggers lockout, 100 API calls/min per developer key, 20 account creations/day per IP.

The system sounds simple, but the real challenge is enforcing per-user limits across distributed servers at sub-5ms latency. Requests hit different gateway nodes.

How do we share counter state without adding latency? And when Redis goes down, do we fail-closed (block everything, causing an outage) or fail-open (allow everything, risking abuse)?

Cloudflare blocks 72B threats/day. A rate limiter enforces request quotas per client, returning 429 when limits are exceeded. The real challenge: enforcing limits across distributed servers at sub-5ms latency.

Cloudflare blocks 72 billion threats per day across its network, and the first line of defense is rate limiting. A rate limiter caps the number of requests an entity (user, IP, API key) can make within a time window. Exceed the limit and you get HTTP 429 Too Many Requests. The system sounds simple, but the real challenge is making the allow/deny decision in under 5ms at p99 while staying consistent across dozens of distributed nodes processing millions of requests per second.

Why not scale up instead? Because one abusive client generating 10K RPS costs you $\sim$ \$2,000/day in compute. We rate limit because it is cheaper than scaling to absorb abuse.
We chose token bucket for burst-tolerant APIs and sliding window counter for smooth throttling. The algorithm decision depends on whether our API tolerates spikes or requires even traffic shaping.
A single Redis node handles 100K+ rate-check operations per second with sub-millisecond latency. Two nodes with failover give us the 99.99% availability the system needs.
This topic tests your ability to reason about distributed counters, atomicity without locks, and graceful degradation. The patterns transfer directly to distributed counting, leaderboards, and quota management.