STANDARDwalkthrough

Sequence Overflow and Burst Handling

6 of 8
3 related
Twelve sequence bits allow 4,096 IDs per worker per millisecond. What happens on ID 4,097?
That wait is at most 1ms, invisible to callers: unless a worker sustains more than 4M IDs/sec, in which case every millisecond saturates and the generator becomes a hard rate limiter. The design question is what saturation means and how to stay out of it.
The generator has spent its budget for this millisecond: the correct behavior is to spin-wait until the next millisecond and reset the sequence to zero.
First, know your real ceiling: the bit ceiling is 4.096M/sec/worker, but the practical ceiling is the serving path: a gRPC ID service tops out around 100-200K IDs/sec/worker on RPC overhead alone, fifty times below the bits. The bits are not the bottleneck; the network is.
That drives the two standard optimizations. Batching: a caller asks for 500 IDs in one RPC; the worker reserves a sequence run and returns the batch: one network round trip amortized across 500 IDs, and suddenly one worker serves millions/sec of demand again. Client-side embedding: skip the service entirely and link the generator as a library inside each application process (each app instance is its own worker): zero network on the hot path, at the cost of spending worker IDs faster and pushing clock discipline onto every app host. The failure mode to name: a burst-loop spike: a buggy client requesting IDs in a tight loop saturates one worker's sequence while the fleet idles; per-caller rate limits and batch-only APIs above a threshold keep one bad citizen from making the generator look slow.
Monitoring: sequence saturation ratio (milliseconds that hit 4,095, as a fraction): above ~10% sustained, add workers or push batching before latency appears. What if the interviewer asks: why not more sequence bits?
The budget again: 13 sequence bits doubles per-ms burst but halves either your worker count or your epoch lifetime: every answer in this topic routes back to the 64-bit budget.
Why it matters in interviews
Sequence overflow is the interview probe for whether you know your real bottleneck: candidates quote 4M/sec/worker; the practical ceiling is RPC overhead, and batching or embedding as a library is the fix. The saturation-ratio metric shows operational thinking.
Related concepts