EASYwalkthrough

Base62 Encoding

1 of 8

1 related

We need to convert a numeric counter into a short, URL-safe string. We chose Base62 encoding (not Base64 or Base10) because Base62 uses only 62 symbols (a-z, A-Z, 0-9), all of which are safe inside URLs without percent-encoding.

Base10 would work mathematically, but a 13-digit decimal string is harder to type and remember than a 7-character alphanumeric code. With 7 characters, we get

62^7 = 3.5

trillion unique codes.

“Base64 includes + and / characters that break inside query strings and need escaping.”

At 100 million new URLs per day, that keyspace lasts 3.5 trillion / 100 million = 95+ years before exhaustion. Implication: a single monotonic counter never needs to roll over in our system's lifetime, so we avoid the complexity of counter reset logic or multi-epoch schemes.

The encoding is deterministic and reversible: given a short code, we can decode it back to the original counter value without a database lookup. This means debugging is trivial and we can extract creation order from any short code.

Trade-off: we accept a slightly smaller keyspace than Base64 (which gives

64^7 = 4.4

trillion) in exchange for zero URL-encoding issues. Instagram uses a similar numeric-to-alphanumeric encoding for photo IDs, converting a 64-bit Snowflake ID into a compact URL path.

What if the interviewer asks: why not use MD5 or SHA-256 hashes instead? Because hash collisions require detection and retry logic, adding complexity to the write path.

A monotonic counter guarantees zero collisions by construction.

Formula & tradeoffs

Formula

62^7 = 3{,}521{,}614{,}606{,}208 \approx 3.5 \text{ trillion codes}

Why it matters in interviews

Interviewers expect us to justify why 7 characters is enough. Showing the exhaustion math and comparing Base62 against Base64 and Base10 proves we chose the encoding deliberately, not copied it from a blog post.

Related concepts

Next →Consistent Hashing