Whiteboard ScaleTopicsClick Aggregator

Ad Click Aggregator

COMMON

Click aggregation is asked at Google, Meta, and Amazon because it is the streaming interview with dollar amounts attached: 8.6 billion clicks a day feeding dashboards that must be fresh in seconds, budget enforcement where every second of lag is priced in cash, and invoices that must be exact under audit. You will count twice (a Flink stream path for decisions, a nightly batch over the immutable raw log for money), survive the viral ad with two-stage salted aggregation, walk the three-link exactly-once chain, and separate duplicates (identity) from fraud (judgment).

  • Count twice: stream for dashboards and budgets, batch over the raw log for billing, reconciliation between
  • Survive the Super Bowl ad: two-stage salted aggregation bounds the merge at 16 msg/sec regardless of volume
  • Exactly-once as a three-link chain: event identity, atomic state+offsets, and the sink everyone forgets
GoogleMetaAmazonTikTokPinterestUber
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

I would design a click aggregator for 8.6 billion clicks a day, 500K per second at peak, serving three masters on different points of the speed-certainty curve. The key move: count twice. A Flink stream path: event-time windows with 15-minute lateness, two-stage salted aggregation (a viral ad costs the merge layer 16 messages a second, not 100K), checkpoint-atomic state, idempotent upsert sink: feeds dashboards in seconds and a budget lane fresh within 1-2 seconds, because staleness times spend velocity is dollars the platform eats. A nightly batch over the immutable raw log (1.3 TB/day, about $30 of S3) recomputes truth with exhaustive dedup and mature fraud verdicts, and reconciliation pages past 0.1% divergence.

Concepts Unlocked8 concepts in this topic