STANDARDwalkthrough

Rate Limiting

4 of 8
1 related
Without rate limiting, a single bot can create 1 million short URLs per hour. At 500 bytes per row, that is 500 MB of junk data filling our database every hour, consuming counter ranges that legitimate users need and inflating storage costs.
We chose token bucket (not sliding window log) because token bucket handles bursts gracefully. Tokens accumulate during idle periods, so a legitimate user who creates 10 URLs in a burst is not penalized if they were idle for the previous minute.
The constraint: we must protect both infrastructure and keyspace without blocking legitimate burst traffic.
Sliding window log gives more accurate per-second enforcement but stores a timestamp for every request, using significantly more memory at scale. Trade-off: we accept slightly less precise per-second rate enforcement in exchange for lower memory overhead and better burst tolerance.
For URL shortener writes, we set a threshold of 100 URLs per minute per API key, enforced via a counter in Redis. Read endpoints allow 1,000 requests per second per IP because reads are cheaper and served from cache.
We enforce rate limiting at the API gateway layer (NGINX, Kong, or Cloudflare Workers) rather than in application code because gateway enforcement prevents requests from consuming application server resources at all. Cloudflare's rate limiting product processes billions of requests daily using a distributed token bucket across their 300+ Points of Presence (POPs).
What if the interviewer asks: what happens to rate-limited requests? We return HTTP 429 with a Retry-After header telling the client exactly when to retry, rather than silently dropping requests.

Formula & tradeoffs

Formula
Sliding window=count in windowwindow size\text{Sliding window} = \frac{\text{count in window}}{\text{window size}}
Why it matters in interviews
Rate limiting protects both our infrastructure and our keyspace. Interviewers expect us to name a specific algorithm and explain why we chose it over the alternative, not hand-wave add rate limiting without justifying the approach.
Related concepts