STANDARDwalkthrough

Distributed Rate Limiting

6 of 8

2 related

One API gateway, one Redis node, rate limiting works. But our production setup has 8 API gateway instances behind a load balancer, and that breaks single-node assumptions.

Both read the counter as 99 (limit is 100), both increment to 100, and both allow the request. The user made 101 requests because of a race condition.

“The constraint: two requests from the same user hit two different gateways at the same millisecond.”

The race window equals the round-trip time from gateway to Redis, roughly 1-5ms. We fix this with Lua scripting in Redis (not distributed locks, not application-level CAS).

A Lua script that reads, checks, and increments the counter runs atomically on the Redis server, eliminating the read-check-increment race entirely. We chose Lua scripts (not Redis transactions with MULTI/EXEC) because MULTI/EXEC does not support conditional logic: we cannot check the counter value and decide whether to increment within a transaction.

Trade-off: we gave up the ability to spread rate limit logic across application code in exchange for atomic enforcement. Cloudflare processes over 50 million HTTP requests per second using exactly this Lua-in-Redis approach.

For multi-region deployments, Redis Cluster partitions rate limit keys by user ID across shards. The trade-off: a user roaming between regions gets slightly different limits because each region's Redis cluster maintains its own counters. Stripe solves this by using a single primary region for rate limit state and accepting 20-50ms of cross-region latency on the check.

What if the interviewer asks: 'What if Redis itself becomes a bottleneck?' We shard rate limit keys across multiple Redis instances by hashing user_id, giving us horizontal write throughput.

Formula & tradeoffs

Formula

\text{Race window} = \text{RTT}_{\text{gateway} \to \text{Redis}} \approx 1\text{-}5\text{ms}

Why it matters in interviews

Single-node rate limiting is trivial. Interviewers ask about the distributed case to test whether we understand race conditions and how Lua atomicity prevents them. Mentioning multi-region trade-offs elevates the answer to staff-level.

Related concepts

← PreviousRate Limit Headers Next →Throttling Types