Whiteboard ScaleRate LimiterDesign Walkthrough

Rate Limiter System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 25 min
We designed a rate limiter handling 2.9M checks/sec with Sliding Window Counter (400 bytes/user, 99.7% accuracy vs Sliding Log's 6 TB impossibility). Redis counters with Lua-script atomicity eliminate race conditions. Fail-open with local fallback counters on Redis failure. API Gateway middleware placement avoids the extra network hop of a separate microservice.
1/10
1.

What is Rate Limiter?

A rate limiter controls how many requests a client can make in a given time window. Cross the threshold: HTTP 429 Too Many Requests. Cloudflare blocks 72 billion threats per day.
Without rate limiting, a single bot can exhaust database connections, memory, and CPU. Real-world examples: 3 failed logins triggers lockout, 100 API calls/min per developer key, 20 account creations/day per IP.
The system sounds simple, but the real challenge is enforcing per-user limits across distributed servers at sub-5ms latency. Requests hit different gateway nodes.
How do we share counter state without adding latency? And when Redis goes down, do we fail-closed (block everything, causing an outage) or fail-open (allow everything, risking abuse)?
Cloudflare blocks 72B threats/day. A rate limiter enforces request quotas per client, returning 429 when limits are exceeded. The real challenge: enforcing limits across distributed servers at sub-5ms latency.
Cloudflare blocks 72 billion threats per day across its network, and the first line of defense is rate limiting. A rate limiter caps the number of requests an entity (user, IP, API key) can make within a time window. Exceed the limit and you get HTTP 429 Too Many Requests. The system sounds simple, but the real challenge is making the allow/deny decision in under 5ms at p99 while staying consistent across dozens of distributed nodes processing millions of requests per second.
  • Why not scale up instead? Because one abusive client generating 10K RPS costs you \sim\$2,000/day in compute. We rate limit because it is cheaper than scaling to absorb abuse.
  • We chose token bucket for burst-tolerant APIs and sliding window counter for smooth throttling. The algorithm decision depends on whether our API tolerates spikes or requires even traffic shaping.
  • A single Redis node handles 100K+ rate-check operations per second with sub-millisecond latency. Two nodes with failover give us the 99.99% availability the system needs.
  • This topic tests your ability to reason about distributed counters, atomicity without locks, and graceful degradation. The patterns transfer directly to distributed counting, leaderboards, and quota management.