EASYwalkthrough

Throttling Types

7 of 8
2 related
Not all rate limiting is binary allow-or-reject. The constraint: different use cases have different tolerance for overshoot, and a single throttling strategy cannot serve all of them.
000
requests/sec peak throughput
We chose hard throttling for payment APIs (not soft) because allowing even one extra transaction could cause financial inconsistency. Trade-off: we gave up user experience during burst spikes in exchange for strict enforcement. Soft throttling allows a configurable overshoot, typically 10% above the limit.
We choose from three strategies depending on what we are protecting. Hard throttling is the strictest: once the limit is hit, we reject every subsequent request with 429.
If the limit is 1,000 requests per minute, soft throttling allows up to 1,100 before rejecting. This absorbs natural traffic jitter without penalizing users for minor spikes.
Cloudflare uses soft throttling at their edge nodes because CDN traffic is inherently bursty. Elastic throttling goes further by adjusting limits dynamically based on system load. When the cluster is at 40% CPU, the limit might be 5,000 RPS per client.
When CPU hits 80%, it drops to 2,000 RPS. We chose elastic throttling for autoscaling control planes (not hard) because fixed limits during a scaling storm would reject the very API calls needed to bring up new capacity.
Trade-off: we gave up predictable client limits in exchange for system survival. AWS Auto Scaling uses elastic throttling internally for exactly this reason.
The choice maps to what we prioritize: fairness (hard), user experience (soft), or system survival (elastic).
Why it matters in interviews
Saying 'add rate limiting' is incomplete. Interviewers want to know which throttling type we would choose and why. Matching the strategy to the use case (payments vs CDN vs autoscaling) shows production judgment.
Related concepts