STANDARDwalkthrough
Webhook Delivery with Retry
Merchants need to know when a charge succeeds, fails, or is refunded. We deliver these events via webhooks: HTTP POST callbacks to the merchant's configured URL.
The challenge: merchant servers are unreliable. A merchant's endpoint might return 500, timeout, or be down entirely.
“At 432M transactions/day with ~3 state changes each, we send 1.3B webhooks/day = 15K/sec.”
We guarantee at-least-once delivery using a Kafka-backed retry pipeline. On failure, we retry with exponential backoff: 1 min, 5 min, 30 min, 2 hours, 24 hours.
After 5 retries over ~26 hours, we move the event to a dead letter queue (DLQ) and notify the merchant via email. The merchant can replay events from the DLQ via the dashboard.
Why at-least-once and not exactly-once? Because exactly-once delivery across an unreliable network requires the merchant to implement idempotency anyway.
We include an event_id in every webhook so merchants can dedup. Stripe retries webhooks up to 30 times over 3 days.
Each webhook includes a signature (HMAC-SHA256 of the payload using the merchant's webhook secret) so the merchant can verify authenticity. Trade-off: at-least-once means merchants may receive duplicate events.
We include the event_id and a timestamp so they can dedup and process only the latest.
Related concepts