Click Aggregator Cheat Sheet
Key concepts, trade-offs, and quick-reference notes for your interview prep.
Count Twice: The Lambda Answer, Defended
#1💡 Fast path for decisions, slow path for money. Two codebases is a real cost: name it before the interviewer does.
Exactly-Once: The Three-Link Chain
#2💡 The sink is where exactly-once dies quietly. Upsert-by-window makes re-emission naturally safe.
Event Time, Watermarks, Late Clicks
#3💡 Watermark -> fire -> lateness -> correction -> side-output: know the full lifecycle, not just the word.
Two-Stage Aggregation: The Viral Ad
#4💡 This is MapReduce's combiner in streaming clothes. Write hot spots salt; read hot spots cache.
The Raw Log: Views Are Disposable, Truth Is Not
#5💡 Event sourcing, applied narrowly: aggregates are snapshots, the log is the events, replay is the recovery story.
The Budget Race: Latency Priced in Dollars
#6💡 Under-delivery costs goodwill; over-delivery costs cash. That asymmetry decides every tie.
Duplicates Need Identity; Fraud Needs Judgment
#7💡 Marks, never deletes: the fraudulent click stays in the log, flagged in the verdict stream. Evidence and judgment stay separate.
OLAP Serving: The Right Read Path
#8💡 KV stores die on dimensions you did not key. Columnar IS the index for ad-hoc slicing.
Capacity: 8.6B Clicks/Day
#9💡 The raw log dwarfs everything downstream: aggregation is a 500x compression that makes serving cheap.
The Metrics That Matter
#10💡 Budget staleness is the metric to show an interviewer: latency literally denominated in dollars.