STANDARDwalkthrough
Distributed Rate Limiting
One API gateway, one Redis node, rate limiting works. But our production setup has 8 API gateway instances behind a load balancer, and that breaks single-node assumptions.
Both read the counter as 99 (limit is 100), both increment to 100, and both allow the request. The user made 101 requests because of a race condition.
“The constraint: two requests from the same user hit two different gateways at the same millisecond.”
The race window equals the round-trip time from gateway to Redis, roughly 1-5ms. We fix this with Lua scripting in Redis (not distributed locks, not application-level CAS).
A Lua script that reads, checks, and increments the counter runs atomically on the Redis server, eliminating the read-check-increment race entirely. We chose Lua scripts (not Redis transactions with MULTI/EXEC) because MULTI/EXEC does not support conditional logic: we cannot check the counter value and decide whether to increment within a transaction.
Trade-off: we gave up the ability to spread rate limit logic across application code in exchange for atomic enforcement. Cloudflare processes over 50 million HTTP requests per second using exactly this Lua-in-Redis approach.
For multi-region deployments, Redis Cluster partitions rate limit keys by user ID across shards. The trade-off: a user roaming between regions gets slightly different limits because each region's Redis cluster maintains its own counters. Stripe solves this by using a single primary region for rate limit state and accepting 20-50ms of cross-region latency on the check.
What if the interviewer asks: 'What if Redis itself becomes a bottleneck?' We shard rate limit keys across multiple Redis instances by hashing user_id, giving us horizontal write throughput.
Formula & tradeoffs
Formula
Related concepts