STANDARDwalkthrough

The Raw Log Is the Source of Truth

5 of 8
3 related
Every aggregate in this system: stream counters, OLAP rollups, invoices: is a derived view, and the discipline that keeps a money system honest is refusing to ever confuse a view with the truth. The truth is the raw click log: every captured click, immutable, append-only, exactly as it arrived: landed in Kafka (7-day hot retention) and archived to object storage (S3, partitioned by hour, compressed, effectively forever: 8.6B clicks/day x 150B is ~1.3 TB/day raw, a rounding error next to its value).
The disciplines that keep it trustworthy: capture writes the log before anything else acknowledges the click (durable-first, the scheduler's lesson); the log is never edited: fraud does not delete clicks, it marks them in a separate verdict stream, so the evidence and the judgment stay distinct; and schema evolves additively (new fields, never repurposed ones), because a log you cannot parse in five years is a log you do not have. What if the interviewer asks: is this just event sourcing?
What the log buys, concretely. Replayability: a bug ships in the aggregation logic and miscounts Tuesday: with the log, the fix is redeploy-and-replay: recompute Tuesday from truth: without it, Tuesday's numbers are wrong forever and the apology is permanent. Reprocessing for change: fraud rules improve monthly; new attribution logic arrives; a court demands recomputation under different rules: all are new jobs over the same log, no schema migration, no backfill archaeology. Audit: when an advertiser disputes an invoice, the answer is not "our counter says so": it is the specific clicks, with timestamps, IPs, and user agents, that produced the number: the log is the evidence locker. Bootstrap: every new consumer (a data-science model, a new OLAP store, next year's redesign) starts life by replaying history it never witnessed.
Yes: applied narrowly and profitably: aggregates are snapshots, the log is the events, and rebuilding state from events is the recovery story for every failure this topic has.
Why it matters in interviews
Views are disposable, the log is not reframes the whole architecture: stream and batch become two readers of one truth. The fraud-marks-not-deletes detail and additive-schema rule are the operational habits that separate systems that survive audits from systems that settle disputes.
Related concepts