STANDARDwalkthrough

Windows, Watermarks, and the Late Click

3 of 8
3 related
"Clicks per ad per minute" sounds like a GROUP BY until you ask: which minute? A click happens at 12:00:59 on a phone in a subway tunnel, uploads at 12:03:30, and reaches Kafka at 12:03:31. Event time (when it happened: 12:00) and processing time (when we saw it: 12:03) disagree by three and a half minutes, and billing obviously owes the 12:00 window: so the aggregator keys windows by event time, which immediately raises the harder question: when is the 12:00 window finished? Waiting forever is not an option; closing at 12:01 sharp drops every tunnel click.
When the watermark passes 12:01, the 12:00 window fires: emits its aggregate downstream: but does not immediately die: it lingers for an allowed lateness (ours: 15 minutes), during which stragglers trigger corrections: updated aggregates that overwrite the earlier emission (which is why the OLAP sink upserts by (ad, window) rather than appending). Events later than even that become side-output: too late for the dashboard, never too late for billing, because the nightly batch over the raw log has no deadline at all: the lateness problem is another quiet argument for counting twice.
The stream processing answer is the watermark: a moving assertion that says "I believe I have now seen (nearly) all events up to time T", computed from observed event-time progress minus a tolerance.
The knobs are honest trade-offs: a tight watermark gives fresh dashboards and more corrections; a loose one gives stable numbers and stale dashboards: and mobile traffic's long tail (offline apps syncing hours later) means some lateness always escapes the stream path. What if the interviewer asks: why tumbling one-minute windows and not sliding?
Billing and budgets consume discrete intervals: tumbling gives each click exactly one home; sliding windows double-count by construction and belong to trend detection, not accounting.
Why it matters in interviews
Event-time vs processing-time is THE conceptual hurdle of streaming interviews, and the subway click makes it concrete. Watermark -> fire -> allowed lateness -> correction -> side-output is the full lifecycle; candidates who stop at "use watermarks" get the follow-up they were not ready for.
Related concepts