ID Generator Anti-Patterns

Common design mistakes candidates make. Learn what goes wrong and how to avoid each trap in your interview.

UUIDv4 as a Clustered Primary Key

Very CommonFORMULA

Using random UUIDs as the primary key in a B-tree-clustered store (InnoDB, default PostgreSQL indexes) at write-heavy scale.

Why: UUIDs need zero infrastructure and every language generates them in one line; the locality damage only appears when the index outgrows RAM.

WRONG: Orders table keyed by UUIDv4. Every insert lands on a random index page; page splits cascade, the buffer pool thrashes, and insert throughput drops an order of magnitude at a few hundred million rows.

RIGHT: K-sortable keys: Snowflake IDs (8 bytes, append-locality) or UUIDv7 if 128 bits and zero infra are required. Random UUIDs remain fine as non-clustered unique columns or external references.

No Backward-Clock Policy

Very CommonFORMULA

Reading the wall clock every generation with no comparison against the last-used timestamp.

Why: Clocks move forward in every test; NTP steps, VM migrations, and leap-second smears happen in production at 3 AM.

WRONG: NTP steps the clock back 300ms. The generator re-mints 300ms worth of timestamps; duplicate IDs flow into every downstream table and queue, corrupting silently: no error is thrown anywhere.

RIGHT: Track last-used timestamp. Small regression: spin-wait. Large: refuse and page. Persist the timestamp across restarts, run NTP slew-only, and keep a duplicate canary (SETNX sampling) so "impossible" is verified continuously.

Hardcoded Worker IDs

Very CommonFORMULA

Worker IDs set in config files or environment variables, maintained by hand or copy-paste.

Why: It is the fastest way to boot the first three workers, and duplication requires two specific machines to overlap: invisible until it is not.

WRONG: A deployment manifest is duplicated for a new region; two pods run as worker 7. Every overlapping millisecond can mint identical IDs, and nothing detects it because both workers are individually healthy.

RIGHT: Assignment with single-ownership guarantees: ephemeral leases (ZooKeeper/etcd) with fencing deadlines, or StatefulSet ordinals where the orchestrator enforces at-most-one pod-N. Uniqueness by construction, not by discipline.

Timestamp-Only IDs

CommonFORMULA

Using the current time (ms or ns) alone as the identifier because collisions "seem unlikely".

Why: It works in the demo where one process mints occasionally; concurrency makes same-tick collisions a certainty at scale.

WRONG: Two requests in the same millisecond, or two servers with aligned clocks, mint the same ID. At 10K writes/sec, EVERY millisecond has ~10 colliding candidates.

RIGHT: Timestamp is one field, never the whole ID: add worker bits (who minted) and sequence bits (which one this millisecond). That is the entire Snowflake insight.

Exposing Dense Sequential IDs Publicly

CommonFORMULA

Auto-increment or ticket-range IDs used directly in public URLs and APIs (/orders/10452).

Why: The ID already exists and is convenient; the information it leaks is invisible to the team shipping the feature.

WRONG: Competitors sample /orders/n weekly and read your order volume off the counter (the German tank problem); users enumerate adjacent IDs and probe other people's resources; both are silent.

RIGHT: Internal keys can be dense; public identifiers get a non-enumerable form: Snowflake IDs leak only coarse timing, or add an opaque slug/hashid for public surfaces. Authorization still required: obscurity is a rate limiter, not a lock.

One Database Auto-Increment for Everything

CommonFORMULA

Routing all ID generation through a single database's AUTO_INCREMENT because it guarantees uniqueness and density.

Why: It is what the database gives you for free, and it genuinely works until the write path or the availability story outgrows one box.

WRONG: Every insert across 40 services serializes through one counter row. The ID database becomes the platform's write ceiling and its single point of failure: an outage there stops EVERY service that creates anything.

RIGHT: Break the per-ID dependency: range allocation (grant 10K-ID blocks per worker, Flickr-style) keeps density with 1/10,000th the coordination, or Snowflake removes the central box entirely at the cost of clock/worker discipline.

Worker Leases Without a Fencing Deadline

OccasionalFORMULA

Claiming worker IDs via ephemeral leases but generating right up until the lease loss is externally observed.

Why: The happy path (lease expires, worker is truly dead) works; the unhappy path needs a GC pause and a reassignment to line up.

WRONG: Worker 7 GC-pauses for 12s; its 10s lease expires; a new node claims 7; old-7 wakes and generates as worker 7 for 4 more seconds. Two live workers, same ID: duplicates with no failure signal.

RIGHT: Fence by time: generation deadline = lease TTL minus safety margin, checked before EVERY mint. A worker that cannot renew stops generating while its lease is still valid: the same fencing discipline as lease-based range splits in the control-plane topic.

Business Logic Parsing ID Internals

OccasionalFORMULA

Downstream services extracting timestamps, worker IDs, or shard hints from ID bits and building behavior on the layout.

Why: The bits are right there and decoding saves a column; every decoder feels harmless individually.

WRONG: Analytics decodes creation time from bit 22, a router derives shard from worker bits, a partner sorts by raw ID for causality. The bit layout is now a public API: re-tuning the budget (or fixing an epoch) breaks unknown consumers silently.

RIGHT: Treat IDs as opaque outside the platform team. Expose created_at as a field, shard hints via API, and keep the decode capability as an internal debugging tool with the epoch documented as a versioned contract.