Using MD5/SHA for Short Codes
Very CommonCONCEPT
Generating short codes by hashing the long URL with MD5 or SHA-256, then truncating to 7 characters.
Why: MD5 feels like the 'standard' approach for generating unique strings. Candidates default to what they know from other contexts.
WRONG: Hash the long URL with MD5 (128-bit output), take first 7 characters as short code.
RIGHT: Use an auto-incrementing counter + Base62 encoding. Each ID maps to exactly one short code with zero collisions. Simpler, faster, and collision-free.
See pattern: Collision Handling →Ignoring Read/Write Ratio
Very CommonCONCEPT
Designing the system as if reads and writes are equally frequent, missing the 100:1 ratio.
Why: Candidates jump straight to write-path design (URL creation) without considering that redirects dominate traffic.
WRONG: Optimize the write path with complex distributed ID generation, while leaving the read path with direct DB lookups.
RIGHT: Recognize the 100:1 read:write ratio first. Prioritize read-path optimization: aggressive caching (Redis), CDN for popular URLs, and read replicas.
Single Point of Failure Database
CommonCONCEPT
Using a single database instance with no replication or failover strategy.
Why: Candidates forget to discuss high availability. A single DB works fine in development but fails in production at scale.
WRONG: Single MySQL instance stores all URLs. If it goes down, the entire service is unavailable.
RIGHT: Use primary-replica replication for reads. Add automatic failover. Consider multi-region deployment for global latency.
No Cache Layer
CommonTIME WASTE
Every redirect request hits the database directly, even for the most popular URLs.
Why: Candidates design the happy path (cache miss) and forget that 80% of traffic goes to 20% of URLs.
WRONG: GET /:code -> DB lookup -> redirect. Every single request queries the database.
RIGHT: Add Redis/Memcached as a cache-aside layer. Check cache first, fall back to DB on miss, populate cache on read. 80%+ hit rate is achievable.
See pattern: Cache Strategy Selection →Synchronous Analytics Writes
CommonTIME WASTE
Logging click analytics synchronously in the redirect path, adding latency to every redirect.
Why: It seems natural to log the click in the same request handler that performs the redirect.
WRONG: On redirect: write click event to analytics DB, wait for acknowledgment, then send 301/302 response.
RIGHT: Fire-and-forget: publish click event to Kafka/SQS asynchronously. The redirect response returns immediately. A separate consumer processes analytics.
No TTL or Expiration
CommonDOMAIN
URLs live forever with no cleanup mechanism, causing unbounded storage growth.
Why: It is easy to skip the 'what happens to old URLs' question. The system works fine initially, but storage grows forever.
WRONG: Every URL is permanent. After 5 years, you have 30 billion records with no way to reclaim space.
RIGHT: Add an optional expires_at field. Run a background cleanup job to delete expired URLs and free their short codes. Default TTL of 2 years if not specified.
Not Handling Custom Aliases
OccasionalCASE MISS
Only supporting auto-generated short codes, ignoring the common requirement for vanity URLs.
Why: Candidates focus on the auto-generation algorithm and forget that users often want custom aliases (e.g., bit.ly/my-brand).
WRONG: API only accepts long URL, always auto-generates the short code. No way for users to choose their own.
RIGHT: Add an optional 'customAlias' field to the create API. Check uniqueness against existing codes. Validate format (alphanumeric, reasonable length).
Ignoring Analytics Requirements
OccasionalCASE MISS
Building only the shorten/redirect functionality without any click tracking or reporting.
Why: Analytics seems like a 'nice to have' rather than a core feature. But interviewers expect it.
WRONG: Only implement POST /shorten and GET /:code. No tracking of who clicked, when, from where.
RIGHT: Track click count, timestamp, referrer, user agent, and geo-location. Store in a separate analytics store (not the main DB). Expose GET /api/v1/stats/:code endpoint.