Music Streaming Platform

COMMON

Music streaming is asked at Spotify, Apple, and Amazon interviews because it tests adaptive bitrate delivery, CDN economics, and financial-grade event pipelines. It is how Spotify delivers 100 million tracks to 300 million daily users. You will design byte-range audio streaming (not HLS segments, because audio files are 1,000x smaller than video), a dual-buffer gapless playback engine, and a play count pipeline where every 30-second play triggers an exactly-once royalty payment across 12 billion daily events.

Design byte-range audio streaming with 5-tier adaptive bitrate and 99% CDN cache hit ratio
Build a dual-buffer gapless playback engine with sub-50ms crossfade
Architect an exactly-once play counting pipeline for 12B daily royalty events

SpotifyAppleAmazonYouTube MusicSoundCloudTidal

Concepts

Deep dives

Cheat Items

Quick ref

▶

Visual Solutions

Step-by-step animated walkthroughs with capacity estimation, API design, database schema, and failure modes built in.

AnimatedWatch solutions →

📋

Cheat Sheet

Key concepts, trade-offs, and quick-reference notes for Music Streaming. Everything you need at a glance.

Quick referenceView cheat sheet →

⚠

Anti-Patterns

Common design mistakes candidates make. Wrong approaches vs correct approaches for each trap.

8 anti-patternsLearn pitfalls →

🔥

Failure Modes

What breaks in production, how to detect it, and how to fix it. Detection metrics, mitigations, and severity ratings.

5 failure modesStudy failures →

Difficulty Ladder

Start simple. Build to staff-level.

Level 1

Junior / Basics

Core concepts, single-service design, straightforward requirements

Level 2

Mid-Level Interview

Multi-service architecture, trade-off discussions, standard scaling

Level 3

Senior / Deep Dive

Complex distributed systems, failure modes, consistency guarantees

Level 4

Staff+ / FAANG Hard

Planet-scale design, novel architectures, cross-cutting concerns

Elevator Pitch3-minute interview summary

“I would design a music streaming platform for 300M DAU delivering 139K concurrent streams from a catalog of 100M tracks. Each track is encoded at 5 quality tiers (24-320 kbps OGG Vorbis), totaling 1.75 PB of catalog storage. CDN edge caching achieves 99%+ hit ratio because music is replayed thousands of times, unlike video. The player uses a dual-buffer architecture for gapless playback and prefetches the next 2-3 tracks. Play events (12 billion per day) flow through Kafka with exactly-once semantics because every 30-second play triggers a royalty payment. Recommendations combine ALS collaborative filtering with audio-feature CNN for cold-start tracks, cached at 480 GB in Redis.”

Concepts Unlocked8 concepts in this topic

Audio Codec Selection and Adaptive Bitrate

STANDARD

Why OGG Vorbis and not AAC? Because royalty-free codec saves millions at 100M+ tracks. 5-tier ABR switches between tracks, not mid-track, because audio artifacts are more noticeable than video quality drops.

Core Streaming Design

CDN Edge Caching with Track Prefetch

STANDARD

Why 99% cache hit and not 95% like video? Because songs are replayed thousands of times. Top 1% of catalog (1M tracks) fits in 10 TB per POP. Prefetch next 2-3 tracks 30 seconds early.

High Level System Design

Gapless Playback and Crossfade Engine

TRICKY

Why dual-buffer and not single-buffer? Because decoder initialization takes 200-500ms. Buffer A plays, Buffer B decodes the next track 10 seconds early, crossfade in under 50ms.

Core Streaming Design

Play Count Pipeline for Royalty Accounting

TRICKY

Why exactly-once Kafka and not at-least-once? Because every duplicate play at 139K events/sec means an incorrect royalty payment. 30-second threshold is the industry standard.

Monitoring and Complete System

Personalized Recommendation Engine

TRICKY

Why ALS plus audio CNN and not pure collaborative filtering? Because 40K new tracks/day have zero play history. CNN extracts acoustic embeddings for cold-start recommendations.

High Level System Design

Audio File Chunking and Byte-Range Seeking

STANDARD

Why byte-range requests and not HLS segments? Because a 3.5 MB audio file is 1,000x smaller than a 6 GB video. Single-file delivery with seek tables for random access.

Core Streaming Design

Offline Sync and Download Management

STANDARD

Why AES-128-CTR and not AES-CBC? Because CTR allows random seek decryption without decrypting preceding blocks. 30-day DRM license windows balance convenience against rights protection.

Replication and Fault Tolerance

Rights Management and Licensing Metadata

TRICKY

Why denormalized rights by (track_id, country_code)? Because the hot path is a single-key authorization check at 139K/sec. Normalized schema requires 3 joins per play request.

Database Schema