EASYwalkthrough

Token Bucket

1 of 8
2 related
How do we allow traffic bursts without losing long-term control? We need an algorithm that separates burst allowance from steady-state rate, and the token bucket gives us both with two independent knobs.
Each request removes one token. If the bucket is empty, we reject the request.
We maintain a bucket that fills at a fixed refill rate (say 10 tokens per second) and holds up to a burst capacity (say 50 tokens).
During idle periods, tokens accumulate up to the maximum, so a user who sent nothing for 5 seconds has 50 tokens banked and can burst 50 requests instantly before falling back to the steady refill rate. We chose token bucket (not leaky bucket) because leaky bucket processes requests at a fixed output rate regardless of burst.
A user sending 5 quick API calls gets 4 delayed, even if they sent nothing for the previous minute. Trade-off: we gave up smooth output rate in exchange for burst tolerance. Stripe uses token bucket for their API rate limiting because it lets merchants handle checkout spikes without hitting 429 errors during normal traffic. Cloudflare uses the same approach across their 300+ POPs to enforce per-zone rate limits.
What if the interviewer asks: 'When would you pick leaky bucket instead?' We pick leaky bucket when downstream services cannot tolerate any burst, such as a payment settlement queue that must process exactly N transactions per second to match bank API limits.
Why it matters in interviews
Interviewers expect us to explain why token bucket beats leaky bucket for most API rate limiting. Showing that burst capacity and refill rate are independently tunable proves we understand the algorithm beyond its name.
Related concepts