TRICKYwalkthrough

Time-Bucket Partitioning: Finding What Fires Now

1 of 8
3 related
The store holds 10 billion pending tasks: reminders for next Tuesday, payment retries in 90 seconds, report jobs for the 1st of the month: and every second, the scheduler must answer one question: which of these fire right now? The naive answer is a database poll: SELECT * WHERE fire_at <= now() ORDER BY fire_at, every second, forever. At small scale it works; at ours it collapses three ways: the index range-scan competes with millions of inserts on the same B-tree, every scheduler instance polls the same rows (locking or duplicating), and the hot end of the index: the next few seconds: becomes the most contended pages in the database.
Divide the future into fixed buckets: say, one per 5 minutes: and write every task into the bucket containing its fire time: task_buckets/2026-07-05T19:35 holds everything firing in that window. Now "what fires soon" is not a query: it is a bucket read: sequential, bounded, contention-free, touching none of the other 9.99 billion tasks.
The fix is to make time itself the partition key.
Each scheduler instance leases whole buckets (with the fencing discipline from the ID-generator topic), loads its bucket's tasks into an in-memory structure a few minutes ahead, and fires them from RAM at the right instant. Within a bucket, a second-level hash spread (bucket + shard) keeps any single window's load divisible across instances when a popular minute holds millions of tasks.
The numbers: 1B firings/day is ~11.6K/sec average; a 5-minute bucket averages ~3.5M tasks at ~200B each = 700 MB: loadable into memory with headroom. Cancellations and reschedules write a tombstone or move to the bucket, checked at fire time.
The trade-offs: bucket granularity is a dial (small buckets = more lease churn; big ones = more RAM per load), and a task scheduled inside the current bucket must bypass the batch path and go straight to the in-memory structure: the one special case every implementation must handle. What if the interviewer asks: why not just shard the DB poll?
Sharded polling is still polling: every shard still burns its hot index end; bucketing removes the query entirely.
Why it matters in interviews
This is the topic's KEY INSIGHT: converting a 10B-row time query into a bounded bucket read. The lease-per-bucket ownership model and the current-bucket special case are the details that show real design rather than "use a cron library".
Related concepts