STANDARDwalkthrough

Device Token Lifecycle

2 of 8

3 related

A user upgrades their phone, reinstalls the app twice, and enables push on their tablet. Each event mints a new device token, the opaque identifier APNs and FCM use to route a push to one physical device.

The hard part is not storage, it is decay. Tokens die constantly: app uninstalls, OS reinstalls, provider-side rotation.

“Our 500M users hold roughly 1 billion active tokens (about 2 devices each), stored at 150 bytes per row:

1\text{B} \times 150\text{B} = 150\text{ GB}

.”

Churn runs about 1.5% per week, which is 15 million tokens going stale every week. What happens if we never clean them up?

Two costs. First, wasted throughput: within a year roughly half our sends would target dead devices, so a 580K/sec peak becomes 290K/sec of useful delivery.

Second, and worse, sender reputation: Apple and Google track the fraction of pushes sent to unregistered tokens, and a persistently high rate gets a sender throttled or blocked, hurting delivery for every user. The fix is a feedback loop.

When APNs returns 410 Unregistered or FCM returns UNREGISTERED, the gateway publishes a token-invalidation event, and a pruning consumer deletes the token within minutes. On registration, the client sends its current token on every app launch; the server upserts with a last_seen timestamp and expires tokens not seen for 270 days as a backstop.

The trade-off: aggressive pruning can delete a token during a transient provider bug, silently unsubscribing a device until next app open. We tolerate that because the alternative, reputation damage, degrades delivery for everyone.

What if the interviewer asks: why not verify tokens before every send? There is no verify API, and probing with silent pushes doubles traffic.

The 410 feedback IS the verification, delivered for free on the send path.

Why it matters in interviews

Candidates who only say "store tokens in a table" miss the operational reality: 1.5% weekly churn and the 410 feedback loop. Explaining sender-reputation risk shows we understand that hygiene is a delivery-rate feature, not a cleanup chore.

Related concepts

← PreviousPriority Queue Isolation Next →Campaign Fanout