Whiteboard ScaleTopicsMusic Streaming

Music Streaming Platform

COMMON

Music streaming is asked at Spotify, Apple, and Amazon interviews because it tests adaptive bitrate delivery, CDN economics, and financial-grade event pipelines. It is how Spotify delivers 100 million tracks to 300 million daily users. You will design byte-range audio streaming (not HLS segments, because audio files are 1,000x smaller than video), a dual-buffer gapless playback engine, and a play count pipeline where every 30-second play triggers an exactly-once royalty payment across 12 billion daily events.

  • Design byte-range audio streaming with 5-tier adaptive bitrate and 99% CDN cache hit ratio
  • Build a dual-buffer gapless playback engine with sub-50ms crossfade
  • Architect an exactly-once play counting pipeline for 12B daily royalty events
SpotifyAppleAmazonYouTube MusicSoundCloudTidal
8
Concepts
Deep dives
10
Cheat Items
Quick ref
Elevator Pitch3-minute interview summary

I would design a music streaming platform for 300M DAU delivering 139K concurrent streams from a catalog of 100M tracks. Each track is encoded at 5 quality tiers (24-320 kbps OGG Vorbis), totaling 1.75 PB of catalog storage. CDN edge caching achieves 99%+ hit ratio because music is replayed thousands of times, unlike video. The player uses a dual-buffer architecture for gapless playback and prefetches the next 2-3 tracks. Play events (12 billion per day) flow through Kafka with exactly-once semantics because every 30-second play triggers a royalty payment. Recommendations combine ALS collaborative filtering with audio-feature CNN for cold-start tracks, cached at 480 GB in Redis.

Concepts Unlocked8 concepts in this topic

Audio Codec Selection and Adaptive Bitrate

STANDARD

Why OGG Vorbis and not AAC? Because royalty-free codec saves millions at 100M+ tracks. 5-tier ABR switches between tracks, not mid-track, because audio artifacts are more noticeable than video quality drops.

Core Streaming Design

CDN Edge Caching with Track Prefetch

STANDARD

Why 99% cache hit and not 95% like video? Because songs are replayed thousands of times. Top 1% of catalog (1M tracks) fits in 10 TB per POP. Prefetch next 2-3 tracks 30 seconds early.

High Level System Design

Gapless Playback and Crossfade Engine

TRICKY

Why dual-buffer and not single-buffer? Because decoder initialization takes 200-500ms. Buffer A plays, Buffer B decodes the next track 10 seconds early, crossfade in under 50ms.

Core Streaming Design

Play Count Pipeline for Royalty Accounting

TRICKY

Why exactly-once Kafka and not at-least-once? Because every duplicate play at 139K events/sec means an incorrect royalty payment. 30-second threshold is the industry standard.

Monitoring and Complete System

Personalized Recommendation Engine

TRICKY

Why ALS plus audio CNN and not pure collaborative filtering? Because 40K new tracks/day have zero play history. CNN extracts acoustic embeddings for cold-start recommendations.

High Level System Design

Audio File Chunking and Byte-Range Seeking

STANDARD

Why byte-range requests and not HLS segments? Because a 3.5 MB audio file is 1,000x smaller than a 6 GB video. Single-file delivery with seek tables for random access.

Core Streaming Design

Offline Sync and Download Management

STANDARD

Why AES-128-CTR and not AES-CBC? Because CTR allows random seek decryption without decrypting preceding blocks. 30-day DRM license windows balance convenience against rights protection.

Replication and Fault Tolerance

Rights Management and Licensing Metadata

TRICKY

Why denormalized rights by (track_id, country_code)? Because the hot path is a single-key authorization check at 139K/sec. Normalized schema requires 3 joins per play request.

Database Schema