TRICKYwalkthrough

Personalized Recommendation Engine

5 of 8
3 related
A new user signs up and has zero listening history. We have 100M tracks and 40,000 new ones arrive daily.
10x
more memory
The constraint: collaborative filtering (which powers 80% of recommendations for existing users) fails completely for new users and new tracks because it requires historical interaction data. We solve this with a hybrid model.
How do we recommend music when we know nothing about this listener, and how do we surface new tracks that have no play data?
For existing users, we run Alternating Least Squares (ALS) collaborative filtering on the user-track interaction matrix. With 300M users and 100M tracks, the full matrix has 3×10163 \times 10^{16} entries, so we factorize it into two low-rank matrices of dimension 128, producing a 128-dimensional embedding per user and per track.
For the cold-start problem (new users and new tracks), we run a convolutional neural network (CNN) on the raw audio waveform to extract acoustic features: tempo, key, energy, timbre. This CNN-derived embedding lets us recommend new tracks based on sonic similarity to tracks the user already likes, bypassing the need for play history.
Spotify's Discover Weekly combines collaborative filtering with audio CNN features, and Spotify reports that 30% of all plays come from algorithmic recommendations. We cache recommendation results in Redis: 300M users×200 track IDs×8B each=480 GB300\text{M users} \times 200\text{ track IDs} \times 8\text{B each} = 480\text{ GB}, refreshed every 24 hours.
Trade-off: we accept a 24-hour refresh cycle (not real-time) because generating 300M recommendation sets is computationally expensive, and listening preferences shift slowly. What if the interviewer asks: why ALS over deep learning for collaborative filtering?
ALS is embarrassingly parallel, scales linearly with the number of non-zero entries, and produces interpretable embeddings. Deep learning recommendation models (like DLRM) are more accurate but 10x more expensive to train at this matrix size.
Why it matters in interviews
Interviewers test whether we can handle the cold-start problem for both new users and new tracks. Explaining the hybrid ALS plus audio CNN approach with concrete cache sizing shows we think about recommendation as an engineering problem, not just an ML concept.
Related concepts