URL Shortener Anti-Patterns

Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.

Using MD5/SHA Hash for Short Codes

Very CommonFORMULA

Hashing the long URL to generate the short code creates collision risk and wastes most of the hash output. We chose a counter instead because it guarantees uniqueness by construction.

Why: Candidates reach for hash functions because they sound "distributed." But MD5 produces 128 bits of output, and we only keep 7 characters. We throw away 99% of the hash, and collisions become inevitable at scale. A counter gives us the same uniqueness guarantee with zero waste.

WRONG: Hash the long URL with MD5, take the first 7 characters. When two URLs collide, append a salt and re-hash. This turns a simple O(1) operation into an unbounded retry loop.

RIGHT: We use a monotonically increasing counter encoded in Base62. Every ID is unique by construction. Zero collisions, no retries, deterministic 7-character output. Trade-off: counter-based codes are guessable (sequential), but we accept this because URL shorteners are not a security boundary.

Single Global Counter Without Ranges

Very CommonFORMULA

A single counter node becomes a bottleneck and single point of failure at 29K writes per second. We chose range pre-allocation to eliminate this contention.

Why: It works fine at low scale. One Redis INCR per URL creation is straightforward. But at 29K RPS, every write serializes through one node. Latency spikes, and if that node goes down, the entire write path stops.

WRONG: One Redis INCR command per URL creation. All app servers contend on the same key, creating a serialization bottleneck that caps throughput around 10K RPS.

RIGHT: We pre-allocate ranges of 10,000 IDs per app server from ZooKeeper (not etcd, because ZooKeeper's sequential node guarantees make range allocation atomic). Each server increments locally with no coordination on the hot path. When a range runs out, fetch the next one. Trade-off: a crashed server wastes its remaining range, creating keyspace gaps, but the 3.5T keyspace makes this negligible.

No Cache Layer for Reads

Very CommonFORMULA

Sending 289K redirect lookups per second directly to MySQL will overwhelm the database within seconds. We put Redis in front to absorb 95%+ of read traffic.

Why: Candidates focus on the write path and forget that reads dominate 10:1. A single MySQL node handles roughly 100K key lookups per second. At 289K RPS, we need either three read replicas or a cache. We chose the cache because it is cheaper, adds less operational complexity, and gives sub-millisecond latency.

WRONG: Query MySQL directly for every redirect request. Rely on connection pooling to absorb the load. This works until traffic doubles, then the database falls over.

RIGHT: We put Redis in front with a 95%+ hit ratio. Only ~14K requests per second fall through to MySQL, well within a single node's capacity. We add read replicas for redundancy, not for throughput. Trade-off: we now have a cache invalidation problem, but URL mappings are immutable after creation, so invalidation is only needed on delete.

Wrong HTTP Redirect Status Code

CommonFORMULA

Using 301 when analytics are required, or 302 when they are not, either breaks tracking or wastes server resources. We default to 302 and offer 301 as an opt-in.

Why: Candidates pick one without weighing the trade-off. 301 tells the browser to cache the redirect permanently, so the browser never contacts our server again. 302 forces the browser to check every time. The wrong choice either kills our analytics or doubles our server load.

WRONG: Always use 301 Moved Permanently. Browsers cache the redirect and skip the server, which means click events disappear from our analytics pipeline.

RIGHT: We default to 302 Found for any URL where we track clicks. We offer 301 as an opt-in for high-traffic links where the owner does not need tracking. Trade-off: 302 means every click hits our server, increasing cost by up to 60% for repeat visitors, but we retain full analytics visibility.

No Rate Limiting on URL Creation

CommonFORMULA

Without rate limits, one abusive client can exhaust counter ranges and fill storage at 50 GB per day. We cap creation at 100 URLs per minute per user.

Why: Rate limiting feels like a nice-to-have. But at 29K writes per second capacity, a single bot running unchecked can generate millions of junk URLs per hour, burning through our counter space and inflating storage costs.

WRONG: Accept unlimited URL creation requests. Plan to detect abuse after the fact with batch analysis. By then, millions of garbage URLs are already in our database.

RIGHT: We cap at 100 URLs per minute per user at the API gateway. We chose a sliding window counter in Redis (not fixed window, because fixed windows allow burst at the boundary). Return 429 Too Many Requests with a Retry-After header. Trade-off: legitimate bulk users need a separate enterprise endpoint with higher limits.

Sharding by User ID

CommonFORMULA

User-based sharding creates hot shards when power users generate millions of URLs. We shard by short_code hash because the hot path is redirect lookup, not user lookup.

Why: User ID feels like a natural partition key. But the redirect path looks up by short_code, not user_id. Sharding by user means every redirect requires a scatter-gather across all shards to find the right record.

WRONG: Shard the urls table by user_id. All URLs from one user land on the same shard. A marketing team that creates 10M URLs in an hour overwhelms that single shard.

RIGHT: We shard by short_code hash (not user_id). Counter-based Base62 codes distribute evenly regardless of which user created them. Redirects hit exactly one shard with no scatter-gather. Trade-off: listing a user's URLs requires a fan-out across all 16 shards, but we accept this because listing is a low-frequency operation compared to redirects.

No Expiry or Cleanup Strategy

CommonFORMULA

Without TTL, the database grows at 5 to 18 TB per year indefinitely. We set a default 1-year TTL with lazy deletion on reads and weekly batch cleanup.

Why: Candidates assume all URLs live forever. In practice, 70%+ of short URLs become inactive within 30 days. Storing them permanently wastes disk, slows index scans, and bloats backups.

WRONG: Store every URL permanently. Hope that disk is cheap enough that it never matters. Five years later, we have 90 TB of mostly dead links.

RIGHT: We set a default TTL of 1 year. Lazy deletion on read: check expires_at before redirecting. Weekly batch cleanup job purges expired rows from the partitioned urls table. Trade-off: we lose the ability to resurrect expired URLs, but 70%+ were already inactive.

Synchronous Analytics on Redirect

CommonFORMULA

Writing analytics data synchronously on every redirect adds 5 to 15ms of latency to the hot path. We chose fire-and-forget to Kafka to keep redirect latency under 10ms.

Why: It is simpler to implement. One request handler does the redirect and the analytics write in sequence. But at 289K RPS, the analytics INSERT becomes the bottleneck, and redirect latency climbs above 50ms.

WRONG: INSERT into click_analytics in the same request handler that returns the 302. Every redirect now waits for a database write to complete before responding.

RIGHT: We fire-and-forget to Kafka (not a background thread pool, because Kafka gives us durability and replay if the consumer falls behind). A separate consumer writes to the analytics store asynchronously. Redirect latency stays under 10ms at p50. Trade-off: analytics data is eventually consistent, delayed by 1 to 5 seconds, but redirect latency is unaffected.