Task Scheduler Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for your interview prep.

Time Buckets: Kill the Poll

SELECT WHERE fire_at <= now() dies at scale: hot index end, poll contention, scan-vs-insert war. Instead, time is the partition key: fixed 5-minute buckets; every task writes into the bucket containing its fire time. "What fires soon" becomes a bounded sequential bucket read touching none of the other 10B rows. Scheduler instances lease whole buckets (fencing discipline from the ID topic), load ~700 MB into memory a few minutes ahead, fire from RAM. Special case: a task scheduled inside the CURRENT bucket bypasses the batch path straight into the wheel.

💡 Sharded polling is still polling. Bucketing removes the query, not just spreads it.

Timing Wheel: O(1) In, O(1) Fire

The in-memory window structure: one slot per second (300 slots for a 5-min window), each slot a plain list. Insert = index + append, O(1). Tick = fire the current slot's list, advance. No heap rebalancing: Kafka's purgatory and Netty's timers work this way. Hierarchical wheels cascade hours -> minutes -> seconds for long horizons. A min-heap is the simpler alternative below ~100K timers: name the load, then choose. The wheel is volatile BY DESIGN: durable truth stays in the bucket store; crashes reload and replay.

💡 Derive the wheel (slots, tick, cascade), don't name-drop it. Granularity is fixed: 1s wheel, seconds SLO.

At-Least-Once: Checkpoint AFTER Enqueue

Fire = enqueue to Kafka, never execute inline. Checkpoint (bucket high-water mark) after the enqueue succeeds: a crash between them re-fires a small suffix: duplicates, never losses. Takeover: new lease holder reloads the bucket, skips below checkpoint, re-fires the uncertain tail. Downstream dedupe: idempotency key = hash(task_id, scheduled_fire_time): so Tuesday's and Wednesday's cron runs differ, but two firings of Tuesday's run collapse. Fencing: can't renew the lease -> stop firing before expiry.

💡 Exactly-once across a crash boundary is a myth here too. At-most-once drops payment retries: choose loud.

Cron: Materialize Only the Next Instance

A recurring schedule is a factory, not a task. Store the template (cron expr, timezone, handler, policies); materialize exactly one pending instance into its bucket. On fire, compute next occurrence, materialize it: a self-perpetuating chain, one row per cron forever. Edits = update template + re-materialize one row. The senior details: DST gaps (2:30am missing/doubled needs a policy), misfire policy per template (fire-all / fire-once / skip: Quartz vocabulary), drift anchor (schedule-relative grid vs completion-relative gap).

💡 "Store cron in UTC" is how you fail the timezone follow-up. Next-occurrence math runs in the template's zone.

The Top-of-Minute Herd Is the Workload

Humans schedule at :00. The shape: 10x average at each minute top, 50x hourly, ~500K/sec at midnight UTC vs 11.6K/sec average. Defenses: default-on stable jitter (hash(task_id) % window: spreads tasks, keeps per-task phase consistent; opt-in jitter goes unused), clock-based scale-ahead (the spike is on the calendar: autoscale workers BEFORE :00, not on queue lag), priority lanes (a medication reminder never queues behind 400K cache refreshes). Watch lateness per second-of-minute: per-minute averages hide the sawtooth.

💡 The herd is customer intent, not abuse. Make round numbers cheap; don't forbid them.

Retry = Reschedule Through Yourself

A retry is a new scheduled task (same idempotency lineage, attempt+1) written into a future bucket: retries inherit bucketing, leases, at-least-once, and monitoring for free. Exponential backoff + jitter (1m, 4m, 16m) prevents the self-inflicted herd after downstream outages. Budgets: per-task max attempts; per-destination circuit breakers (10K tasks retrying one dead webhook must not eat the fleet). Poison tasks: crash-count threshold (worker deaths, distinct from clean failures) -> DLQ with full context + alerting by class. Retries add 10-20% volume steady-state, multiples during outages.

💡 A DLQ nobody reads is a black hole with good intentions. Alert on arrival, by task class.

Late Is a Budget, Early Is a Bug

Lateness: an SLO: p99 <5s, p99.9 <60s (honest about pulses and takeovers): with remedies (capacity, lanes, catch-up ordering). Earliness: a correctness violation, never acceptable: early payment retries hammer struggling processors; pre-market tasks act on closed markets. The invariant costs clock discipline: slew-only NTP, skew fencing past 100ms (a fast clock fires early while feeling punctual), and handoffs carrying the prior holder's high-water timestamp. Catch-up after outages interleaves (fresh tasks ride lanes, backlog drains controlled) and re-checks misfire policy: a 90-minute-old cache-warm task should be skipped, its moment passed.

💡 Measure lateness honestly: dispatch delta + queue wait + execution start. Dashboarding only the first is lying.

Tasks vs Workflows: The Boundary

Scheduler: time-triggered single-shot execution; state per task is tiny and terminal (pending -> fired -> done/dead). Workflow engines (Airflow, Temporal, Conductor): living DAG state: steps, outputs, joins, human waits: a different storage and failure model. The relationship is layered: workflow engines are the scheduler's best customers (DAG kickoffs, durable timers for 30-day sleeps). The scheduler supports chaining (a completed task schedules a successor: retry already does this) without touching if/join/wait. Extension questions get a layering answer, not DAG fields on the task row.

💡 Durable timers + at-least-once + idempotent handlers ARE the primitives Temporal stands on.

Capacity: 10B Pending, 1B Firings/Day

Pending store: 10B tasks x ~200B = 2 TB (sharded PG or Cassandra by bucket+hash). Firings: 1B/day = 11.6K/sec average, 500K/sec midnight pulse. A 5-min bucket averages ~3.5M tasks = 700 MB loaded per lease. Dispatcher cost per firing: microseconds (wheel pop + Kafka publish): a dozen dispatchers idle outside pulses. Workers scale on queue depth with clock-based pre-scaling; execution (seconds each) needs thousands of workers at pulses: the fleet cost lives in execution, never in scheduling.

💡 The scheduler is cheap; the herd's execution is expensive. Size workers for :00, dispatchers for boredom.

The Metrics That Matter

#10

Firing lateness p99/p99.9 split by second-of-minute (the sawtooth detector). Early-fire counter: must read zero forever; any tick is a clock incident. Bucket lease health: unowned buckets = tasks silently not firing: the scheduler's deadest failure. Queue depth + oldest-message age per priority lane. Retry amplification ratio (firings/original): >1.3 sustained means a downstream is failing. DLQ arrival rate by task class, alerting on arrival. Checkpoint lag per bucket (replay cost on takeover). And the synthetic: a canary task scheduled every minute, end-to-end timed.

💡 An unowned bucket fails silently: nothing errors, tasks just don't fire. Lease coverage is availability.