I am, mortifyingly, the kind of Taylor Swift listener who has opinions about which 2019 demos should have stayed unreleased.1 So when I built a recommender for her catalogue I was the worst possible audience: any model that surfaced something obvious would feel patronising, and any model that surfaced something stupid would feel personal.1. The original Lover demos. Don't @ me.
The constraint, then, was: build something whose recommendations themselves would be the test. If I rolled it out to friends and they said "yeah, that one too," good. If they said "why this," bad. The architecture had to earn that, not the other way around.
§01Six engines, one query
I ended up with six recommendation models, each looking at a different facet of a song. Five base engines plus an ensemble that learned how to weight them.
- Lyrics-Transformer. Sentence-BERT embeddings over Genius-scraped lyrics. Cosine similarity in the 384-D embedding space.
- Audio VAE. A variational autoencoder over Spotify audio features, compressed to a 16-dimensional latent.2 Distance in latent space picks up sonic neighbours.
- Graph node2vec. Build a song-co-listen graph, run node2vec for 64-D embeddings, retrieve nearest neighbours.
- Neural Collab Filtering. Classic two-tower model on user-song interactions.
- Contrastive SSL. Self-supervised pretext: same song, two augmentations of its audio, pull together; different songs, push apart.
- Ensemble. Weighted blend of the five, with a consensus boost for songs surfaced by ≥2 engines.
2. Spotify's track-level audio features — acousticness, danceability, energy, instrumentalness, liveness, loudness, speechiness, tempo, valence, and a handful more — feed into the VAE and get compressed to 16 latent dimensions. Anything that survives that bottleneck is doing real work.
§02The dataset that made it tractable
The engines run on Taylor's full discography — every era, including Vault tracks — with editorial bridges to neighbouring artists.3 The bridges are metadata, not embeddings: the six engines compute similarity within Taylor's catalogue, and the editorial layer maps outward to artists who share lyrical or sonic DNA. Without that layer, every recommendation would be intra-catalogue — suggesting an album shuffle dressed up as intelligence.3. Editorial bridges to Kacey Musgraves, Maggie Rogers, Olivia Rodrigo, Gracie Abrams, Lana Del Rey, HAIM, Phoebe Bridgers, Carly Rae Jepsen, Paramore, and others. The mapping is hand-curated — each bridge has a reason.
The original was an R/Shiny app I'd written in 2023 — pretty, slow, single-user. The rebuild is a TypeScript + Next.js frontend on Vercel with a FastAPI backend on Render, serving real recommendations in under 200ms.
§02bWhy the ensemble works
Each engine has a failure mode that the others compensate for. Lyrics-only will surface "songs that mention rain" when you wanted "songs that feel like rain." Audio-only will surface acoustic ballads when you wanted lyrical heartbreak. Collaborative filtering will surface whatever's popular this week. The ensemble is less a fusion of strengths than an averaging-out of weaknesses.
// pullThe ensemble is less a fusion of strengths than an averaging-out of weaknesses.
§03What the demo actually feels like
Type a song. Six engines return their picks in parallel. The UI shows you the ensemble's top 10, but lets you toggle "show me what each engine thought" so you can see the disagreements. The disagreements are usually the interesting part: the lyrics engine wants you to listen to The Manuscript; the audio VAE thinks you'd rather have You're On Your Own, Kid; the graph engine sends you somewhere unexpected. They're all defensible, in their own register.
The site is at shubz-taylor-recommendation-engine.vercel.app. Bring your own opinions about the Lover demos.
— written on the train back from a Tortoise live show.