STANDARDwalkthrough
Audio Codec Selection and Adaptive Bitrate
A listener on a morning commute enters a subway tunnel and bandwidth drops from 20 Mbps to 200 Kbps. With a fixed 320 kbps stream, the player stalls for 10 to 15 seconds until signal recovers.
We chose OGG Vorbis at 5 quality tiers (24, 96, 128, 160, 320 kbps) over AAC or MP3 because OGG Vorbis is royalty-free and delivers equivalent perceptual quality to AAC at the same bitrate, saving millions in licensing fees at 100M+ tracks. Why not HLS segments like video streaming?
“The constraint: we cannot predict network conditions, so we must adapt in real time.”
Because a 3.5-minute song at 128 kbps is only 3.5 MB, roughly 1,000 times smaller than a typical video file. Segmenting a 3.5 MB file into 2-second HLS chunks creates unnecessary HTTP overhead without meaningful benefit.
Instead, we stream the entire file via byte-range requests and let the client switch quality tiers between tracks, not mid-track. The client maintains a 20-second lookahead buffer: if buffer drops below 5 seconds, we downgrade to the next lower tier; if buffer exceeds 15 seconds, we upgrade.
Spotify uses OGG Vorbis for free-tier (128 kbps) and premium (320 kbps). Average song at 128 kbps: .
At 320 kbps: . Trade-off: we accept per-track switching granularity (not mid-track like video ABR) in exchange for simpler delivery with fewer HTTP requests.
What if the interviewer asks: why not switch quality mid-track like video? Because audio files are small enough to buffer entirely.
Mid-track switching causes audible artifacts at the transition point, which is unacceptable for music listeners who notice quality changes far more than video viewers.
Related concepts