EASYwalkthrough

Base62 Encoding

1 of 8
1 related
We need to convert a numeric counter into a short, URL-safe string. We chose Base62 encoding (not Base64 or Base10) because Base62 uses only 62 symbols (a-z, A-Z, 0-9), all of which are safe inside URLs without percent-encoding.
Base10 would work mathematically, but a 13-digit decimal string is harder to type and remember than a 7-character alphanumeric code. With 7 characters, we get 627=3.562^7 = 3.5 trillion unique codes.
Base64 includes + and / characters that break inside query strings and need escaping.
At 100 million new URLs per day, that keyspace lasts 3.5 trillion / 100 million = 95+ years before exhaustion. Implication: a single monotonic counter never needs to roll over in our system's lifetime, so we avoid the complexity of counter reset logic or multi-epoch schemes.
The encoding is deterministic and reversible: given a short code, we can decode it back to the original counter value without a database lookup. This means debugging is trivial and we can extract creation order from any short code.
Trade-off: we accept a slightly smaller keyspace than Base64 (which gives 647=4.464^7 = 4.4 trillion) in exchange for zero URL-encoding issues. Instagram uses a similar numeric-to-alphanumeric encoding for photo IDs, converting a 64-bit Snowflake ID into a compact URL path.
What if the interviewer asks: why not use MD5 or SHA-256 hashes instead? Because hash collisions require detection and retry logic, adding complexity to the write path.
A monotonic counter guarantees zero collisions by construction.

Formula & tradeoffs

Formula
627=3,521,614,606,2083.5 trillion codes62^7 = 3{,}521{,}614{,}606{,}208 \approx 3.5 \text{ trillion codes}
Why it matters in interviews
Interviewers expect us to justify why 7 characters is enough. Showing the exhaustion math and comparing Base62 against Base64 and Base10 proves we chose the encoding deliberately, not copied it from a blog post.
Related concepts