TRICKYwalkthrough

Count Twice: Stream for Speed, Batch for Truth

1 of 8
3 related
Two customers need the same number at different speeds. The advertiser dashboard needs click counts within seconds: a campaign manager watching a launch adjusts bids in real time.
One pipeline cannot excel at both, because speed and certainty trade against each other at every layer: fast counting tolerates duplicates and drops late arrivals; correct counting waits for stragglers, deduplicates exhaustively, and applies fraud rulings that take hours to mature. The answer is the topic's key insight: count twice.
The invoice needs click counts that are exactly right: money changes hands on them, auditors examine them, disputes cite them.
The stream path (Flink-class processor over Kafka) maintains per-(ad, minute) counters with second-level freshness: it powers dashboards and, critically, budget enforcement: and it is allowed to be approximately right: a duplicate here or a late click there moves a dashboard pixel, not a dollar. The batch path recomputes every count nightly from the immutable raw log: full dedup, mature fraud verdicts, late events included: and its output is the only number billing ever sees.
Between them runs reconciliation: the batch result diffs against the stream's totals, and a divergence beyond threshold (ours: 0.1%) pages someone, because it means one of the paths is lying. This is the lambda architecture, named honestly: two codebases computing the same thing is a real cost, and the modern kappa counter-argument (one stream path with exactly-once state and long-retention replay) is worth presenting: but when the number on the invoice is legally binding, the batch recomputation over raw truth is not redundancy: it is the audit trail.
What if the interviewer asks: why can billing not just wait for the stream's final windows? Because fraud verdicts arrive hours later, disputes require recomputation months later, and "we replayed the stream job" convinces an auditor less than "we re-ran arithmetic over the immutable log".
Why it matters in interviews
This is the KEY INSIGHT and the topic's architecture in one sentence: fast path for decisions, slow path for money, reconciliation between them. Presenting kappa as the honest counter-argument, then defending lambda on audit grounds, is the senior version of the answer.
Related concepts