Whiteboard ScaleMusic StreamingDesign Walkthrough

Music Streaming System Design Walkthrough

Complete design walkthrough with animated diagrams, capacity math, API design, schema, and failure modes.

Solution PathTarget: 30 min
We designed a music streaming platform for 300M DAU delivering 139K concurrent streams from 100M tracks. Byte-range streaming (not HLS segments) because audio is 1000x smaller than video. CDN achieves 99%+ cache hit ratio because music is replayed. Dual-buffer gapless playback with 50ms crossfade. Play events (12B/day) through Kafka exactly-once for penny-accurate royalties. ALS + audio CNN hybrid recommendations cached at 480 GB in Redis.
1/10
1.

What is Music Streaming?

The system sounds simple: press play, hear music. Spotify delivers 100 million tracks to 300 million daily active users across 180 countries. But the real challenge has three dimensions.
First, gapless playback across variable networks: listeners on trains, in elevators, and on congested Wi-Fi expect zero silence between tracks, which requires a dual-buffer architecture and adaptive bitrate that works differently from video because audio files are 1,000 times smaller. Second, 12 billion daily play events for penny-accurate royalty payments: every 30-second play triggers a fraction-of-a-cent payment to rights holders, so the counting pipeline must be exactly-once with zero double-counts and zero lost events.
A single day of miscounting means millions of dollars in incorrect royalty distribution. Third, personalized discovery across 100 million tracks: with 40,000 new tracks arriving daily, collaborative filtering alone cannot surface new music.
We need audio-feature-based recommendations for cold-start tracks that have zero play history. The design differentiates from video streaming in one fundamental way: music uses single-file byte-range delivery (not HLS/DASH segments) because a 3.5 MB audio file does not benefit from segmentation the way a 6 GB video file does.
Spotify: 100M tracks, 300M DAU, 180 countries. Three challenges: gapless playback on variable networks, 12B daily play events for penny-accurate royalties, personalized discovery with 40K new tracks/day. Key differentiator from video: byte-range delivery, not HLS segments.
A music streaming platform delivers audio content to hundreds of millions of listeners with gapless playback, adaptive quality, and penny-accurate royalty accounting. Spotify serves 100 million tracks to 300 million daily users. The system sounds simple: press play, hear music. But the real challenge is delivering audio with zero playback gaps across variable networks, counting 12 billion play events per day for royalty payments that must be penny-accurate, and recommending from 100 million tracks when 40,000 new ones arrive daily.
  • Byte-range streaming (not HLS segments) because a 3.5 MB audio file is 1,000x smaller than a 6 GB video, making segmentation overhead greater than the content itself
  • 99%+ CDN cache hit ratio (versus video's 95%) because songs are replayed thousands of times; top 1% of tracks (1M songs) fit in 10 TB per edge POP
  • Dual-buffer gapless playback: Buffer A plays current track, Buffer B decodes next track 10 seconds early, crossfade in under 50ms at track boundary
  • Exactly-once Kafka pipeline for 12B daily play events because every 30-second play triggers a royalty payment; double-counting means incorrect financial distribution