STANDARDwalkthrough
Why Not UUIDs: The Index Locality Tax
The obvious answer to unique IDs is UUIDv4: 122 random bits, generate anywhere, collision probability negligible, zero infrastructure. So why did Twitter, Discord, and Instagram all build something else?
Tax one: size. A UUID is 128 bits, stored as 16 bytes (or worse, a 36-character string).
“Three taxes.”
Every index entry, every foreign key, every join column pays double the 8 bytes of a BIGINT: across billions of rows, that is real RAM pushed out of the buffer pool. Tax two, the killer: index locality.
B-tree indexes (InnoDB, PostgreSQL) keep keys sorted. Sequential-ish keys append to the rightmost page: hot in cache, minimal page splits.
Random keys land on a random page every insert: the working set becomes the entire index, pages split constantly, and insert throughput collapses once the index outgrows RAM. Percona's classic benchmark shows UUIDv4 primary-key inserts degrading an order of magnitude at scale versus sequential keys.
Tax three: no ordering signal. UUIDv4 tells you nothing about when a row was created; sorting by creation time needs a second column and a second index.
The fixes exist on both sides. UUIDv7 (standardized 2024) puts a 48-bit Unix-ms timestamp in front of the randomness: sortable, cache-friendly, still 128 bits and still infrastructure-free: it is the right modern default when you do not control both ends. ULID does the same trick with a friendlier string encoding. Snowflake-class IDs stay the answer when you want 64-bit compactness plus explicit control of worker and sequence semantics.
The honest comparison table: UUIDv4 wins on zero-infrastructure, loses on locality and size; UUIDv7/ULID fix locality, keep 128 bits; Snowflake wins on size and control, costs you a worker-ID assignment scheme and clock discipline. What if the interviewer asks: is UUIDv4 collision risk real?
No: makes birthday collisions astronomically unlikely; the objection is performance and semantics, never collisions.
Related concepts