TRICKYwalkthrough
Personalized Recommendation Engine
A new user signs up and has zero listening history. We have 100M tracks and 40,000 new ones arrive daily.
The constraint: collaborative filtering (which powers 80% of recommendations for existing users) fails completely for new users and new tracks because it requires historical interaction data. We solve this with a hybrid model.
“How do we recommend music when we know nothing about this listener, and how do we surface new tracks that have no play data?”
For existing users, we run Alternating Least Squares (ALS) collaborative filtering on the user-track interaction matrix. With 300M users and 100M tracks, the full matrix has entries, so we factorize it into two low-rank matrices of dimension 128, producing a 128-dimensional embedding per user and per track.
For the cold-start problem (new users and new tracks), we run a convolutional neural network (CNN) on the raw audio waveform to extract acoustic features: tempo, key, energy, timbre. This CNN-derived embedding lets us recommend new tracks based on sonic similarity to tracks the user already likes, bypassing the need for play history.
Spotify's Discover Weekly combines collaborative filtering with audio CNN features, and Spotify reports that 30% of all plays come from algorithmic recommendations. We cache recommendation results in Redis: , refreshed every 24 hours.
Trade-off: we accept a 24-hour refresh cycle (not real-time) because generating 300M recommendation sets is computationally expensive, and listening preferences shift slowly. What if the interviewer asks: why ALS over deep learning for collaborative filtering?
ALS is embarrassingly parallel, scales linearly with the number of non-zero entries, and produces interpretable embeddings. Deep learning recommendation models (like DLRM) are more accurate but 10x more expensive to train at this matrix size.
Related concepts