STANDARDwalkthrough

Why Not UUIDs: The Index Locality Tax

2 of 8

3 related

The obvious answer to unique IDs is UUIDv4: 122 random bits, generate anywhere, collision probability negligible, zero infrastructure. So why did Twitter, Discord, and Instagram all build something else?

Tax one: size. A UUID is 128 bits, stored as 16 bytes (or worse, a 36-character string).

“Three taxes.”

Every index entry, every foreign key, every join column pays double the 8 bytes of a BIGINT: across billions of rows, that is real RAM pushed out of the buffer pool. Tax two, the killer: index locality.

B-tree indexes (InnoDB, PostgreSQL) keep keys sorted. Sequential-ish keys append to the rightmost page: hot in cache, minimal page splits.

Random keys land on a random page every insert: the working set becomes the entire index, pages split constantly, and insert throughput collapses once the index outgrows RAM. Percona's classic benchmark shows UUIDv4 primary-key inserts degrading an order of magnitude at scale versus sequential keys.

Tax three: no ordering signal. UUIDv4 tells you nothing about when a row was created; sorting by creation time needs a second column and a second index.

The fixes exist on both sides. UUIDv7 (standardized 2024) puts a 48-bit Unix-ms timestamp in front of the randomness: sortable, cache-friendly, still 128 bits and still infrastructure-free: it is the right modern default when you do not control both ends. ULID does the same trick with a friendlier string encoding. Snowflake-class IDs stay the answer when you want 64-bit compactness plus explicit control of worker and sequence semantics.

The honest comparison table: UUIDv4 wins on zero-infrastructure, loses on locality and size; UUIDv7/ULID fix locality, keep 128 bits; Snowflake wins on size and control, costs you a worker-ID assignment scheme and clock discipline. What if the interviewer asks: is UUIDv4 collision risk real?

No:

2^{122}

makes birthday collisions astronomically unlikely; the objection is performance and semantics, never collisions.

Why it matters in interviews

"Why not UUID" is the first follow-up in every ID interview. Quantifying the locality tax (random inserts fragment the B-tree; Percona's 10x degradation) and knowing UUIDv7 exists shows current, production-grade understanding rather than a cached 2015 answer.

Related concepts

← PreviousThe 64-Bit Budget Next →Clock Backward: The Failure That Defines the Design