TRICKYwalkthrough

The Top-of-Minute Herd

5 of 8
3 related
Humans schedule in round numbers. Reports at 9:00, backups at midnight, syncs "every hour": nobody schedules anything for 9:03:47.
This is not an anomaly to absorb: it is the workload's signature, and the design must treat it as such. Defense one: scheduler-side smoothing.
The consequence is a workload with structural spikes: the top of every minute carries 10x the average firing rate, the top of the hour 50x, and midnight UTC is the single most violent second of the day: our 11.6K/sec average becomes a 500K/sec instant while the seconds around it sit nearly idle.
The firing path is already decoupled (fire = enqueue), so the dispatcher can sustain bursts: timing wheels make firing O(1): and Kafka absorbs the pulse as queue depth; the pressure lands on workers, which autoscale on queue depth with the spike being perfectly predictable: scale-ahead on the clock, not reactively on the lag. Defense two: offered jitter.
For tasks whose owners do not truly need :00 precision (most of them), the API offers: and defaults new crons to: a stable per-task jitter: hash(task_id) % window spreads "hourly" tasks uniformly across the hour while keeping each task's phase consistent run over run. Defaulting matters: opt-in jitter goes unused; opt-out jitter flattens the curve for free.
Defense three: priority lanes at the pulse. When midnight fires 500K tasks into the queue, a medication reminder must not sit behind 400K cache refreshes: execution queues split by priority class (the notification topic's isolation lesson), so latency-sensitive tasks ride an empty lane through the herd.
The monitoring angle: firing lateness percentiles per second-of-minute: a system healthy on average but drowning at :00 shows a sawtooth that per-minute aggregates hide. What if the interviewer asks: why not reject :00 scheduling?
Because the herd is customer intent: the design's job is to make round numbers cheap (jitter where permission exists, capacity where it does not), not to forbid how humans think.
Why it matters in interviews
Every scheduler interview eventually asks about midnight. Knowing the herd is structural (10-50x pulses), and answering with default-on stable jitter + clock-based scale-ahead + priority lanes, turns a gotcha into a prepared design section.
Related concepts