Whiteboard ScaleNotificationsDesign Walkthrough

Notifications System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 25 min
We designed a notification platform delivering 10 billion sends per day across 1 billion device tokens. Three physically isolated priority tiers keep OTPs under 5 seconds while 100M-recipient campaigns burst at 167K sends/sec. At-least-once delivery with deterministic idempotency keys makes crash-replays invisible. Coalescing and collapse keys cut engagement volume 60%, per-user budgets cap marketing at 2 pushes/day, and provider gateways sized by concurrency (580K/sec x 50ms = 29K in flight, ~100 servers) terminate at APNs and FCM behind circuit breakers.
1/10
1.

What is Notifications?

It is 6:00 PM on Super Bowl Sunday. A sports app fires a score alert to 80 million devices, a bank sends a login OTP to one, and both must arrive: the score within a minute to beat the neighbor's cheering, the OTP within 5 seconds to beat the login timeout.
A notification system is the platform that makes both true at once. The naive mental model is "a queue with an HTTP client".
The accurate model is three problems wearing one name. First, a reliability problem: an OTP or a fraud alert must arrive, must arrive once as far as the user can tell, and must arrive now; the system runs at-least-once delivery with deterministic idempotency keys because exactly-once is impossible across Apple's and Google's APIs.
Second, a throughput problem: 10 billion sends per day across 1 billion device tokens, with campaign waves that add 167K sends/sec on top of a 116K/sec baseline, terminating at rate-limited third-party providers we do not control. Third, and least respected, a respect problem: attention is a spendable budget.
Past roughly 2 marketing pushes per day, opt-out rates climb sharply, and a user who disables push is unreachable forever, including for the OTPs. The design differentiator from adjacent systems is worth naming: a chat system (topic 9) optimizes for symmetric, low-latency conversation between users; a news feed (topic 3) optimizes fanout for content users pull when they choose.
A notification system is asymmetric interruption: the platform decides to demand a human's attention. That is why priority tiers, coalescing, budgets, and quiet hours are not features bolted onto a queue; they ARE the system.
Three problems in one platform: reliability (OTP in <5s, at-least-once + idempotency), throughput (10B/day, 1B tokens, campaign bursts), respect (budgets, coalescing, quiet hours: opt-out is forever). Interruption is the product; restraint is the design.
A notification system is the delivery platform that turns product events into messages on user devices across push (APNs, FCM), email, SMS, and in-app channels. It sounds like a queue with an HTTP client attached. At scale it is three distinct engineering problems wearing one name: a reliability problem (an OTP must arrive, exactly once as far as the user can tell, in seconds), a throughput problem (10 billion sends per day with 100M-recipient campaign bursts), and a respect problem (every unnecessary buzz spends user trust, and a user who disables push is unreachable forever).
  • Scale: 10B notifications/day = 116K/sec average, 580K/sec at 5x peak, across 1B device tokens
  • Latency tiers: OTPs and security alerts in under 5 seconds p99; marketing can lag minutes by design
  • Delivery: at-least-once with deterministic idempotency keys; exactly-once is impossible past the provider boundary
  • The quiet product constraint: past ~2 marketing pushes/day, opt-out rates climb and users become permanently unreachable