Whiteboard ScaleTopicsURL Shortener

URL Shortener

VERY COMMON

URL shortener comes up in almost every system design round. It is how Bitly serves 10 billion redirects per month with sub-50ms latency. You will design a counter-based ID generator, a Redis cache layer for 289K reads per second, and a sharded MySQL backend that grows at 18 TB per year.

  • Design a counter-based ID generator with range pre-allocation
  • Build a cache-first read path handling 867K peak RPS
  • Avoid the cache stampede that kills most URL shortener designs
GoogleMetaAmazonMicrosoftUberStripe
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

I would design a URL shortener for 500M DAU using counter-based ID generation with range pre-allocation. Each app server grabs 10,000 IDs from ZooKeeper and increments locally with zero coordination. Base62 encoding gives us 62^7 = 3.5 trillion codes, enough for 96 years at 100M URLs per day. The 10:1 read-to-write ratio (289K vs 29K RPS) makes this cache-first. Redis absorbs 95%+ of reads. Each record averages 131 bytes, so storage grows at 5 TB per year. The urls table is sharded by short_code hash across 16 shards. Click analytics flow through Kafka so redirects stay under 50ms at p99.

Concepts Unlocked8 concepts in this topic

Base62 Encoding

EASY

We chose 7 characters because 62^7 = 3.5 trillion codes. At 100M/day, that lasts 95 years before we need to extend the code length.

Core Feature Design

301 vs 302 Redirects

EASY

We default to 302 (not 301) because we need analytics. 301 cuts server load by 60%, but the browser caches the redirect and we lose click visibility.

What is it?

Consistent Hashing

STANDARD

We chose consistent hashing (not modular hashing) for Redis because when a node dies, only 1/N of the keys need to remap. Without it, the entire cache misses at once.

High Level Design

Rate Limiting

STANDARD

We cap at 100 URLs/min/user via a sliding window in Redis (not fixed window, to avoid burst at boundaries). Without it, one bot can burn through our counter range in hours.

System APIs

Read Replica

STANDARD

We run 3 replicas that could split 289K RPS to ~96K each. But we chose replicas for redundancy, not throughput, because Redis already handles 95% of reads.

High Level Design

Write-Ahead Log (WAL)

STANDARD

We persist each counter range to a WAL before acknowledging. If the counter crashes, the service skips ahead on restart instead of reissuing IDs that cause collisions.

Fault Tolerance

Cache Stampede

TRICKY

One expired key, 50K concurrent cache misses, database on fire. We chose request coalescing so one thread refills the cache while the rest wait, instead of all 50K hitting MySQL.

Fault Tolerance

Bloom Filter

TRICKY

We chose a Bloom filter (not a hash set) for dedup because 12 GB answers 'is this URL already shortened?' for 10B URLs. A hash set would need ~80 GB. Eliminates 85% of unnecessary DB lookups on writes.

Core Feature Design