Six engines for one songbook

I am, mortifyingly, the kind of Taylor Swift listener who has opinions about which 2019 demos should have stayed unreleased.1 So when I built a recommender for her catalogue I was the worst possible audience: any model that surfaced something obvious would feel patronising, and any model that surfaced something stupid would feel personal.1. The original Lover demos. Don't @ me.

§01Where it started: an undergrad free-reign brief

This was my favourite undergraduate project. The brief was unusually generous and pretty much to build anything you like, provided it is written in R. The original was a single collaborative filtering engine wrapped in a Shiny app. Slow (two or three seconds per query), single-user, and opinionated in ways I didn't fully understand yet. But my friends and I had fun with it, and three years later I rebuilt it from scratch. You can find both repos on my Github and below.

~/shubzsharma.com/click.py

engines1 (collab filtering)

response~2–3 s

deploysingle-user local

year2023

→ view on github ↗

3 yrs

↓

rebuilt

~/taylor-rec-v2

v2 · 2026

languagePython + TypeScript

→Click either panel to open the repository. Left: the original R/Shiny project. Right: the TypeScript + FastAPI rebuild. Same problem, three years of thought in between.

§02Six engines, one query

I ended up with six recommendation models, each looking at a different facet of a song. Five base engines plus an ensemble that learned how to weight them.

Lyrics-Transformer. Sentence-BERT embeddings over Genius-scraped lyrics. Cosine similarity in the 384-D embedding space.
Audio VAE. A variational autoencoder over Spotify audio features, compressed to a 16-dimensional latent.2 Distance in latent space picks up sonic neighbours.
Graph node2vec. Build a song-co-listen graph, run node2vec for 64-D embeddings, retrieve nearest neighbours.
Neural Collab Filtering. Classic two-tower model on user-song interactions.
Contrastive SSL. Self-supervised pretext: same song, two augmentations of its audio, pull together; different songs, push apart.
Ensemble. Weighted blend of the five, with a consensus boost for songs surfaced by ≥2 engines.

~/shubzsharma.com/a.py

hover a song · click an engine to toggle

Engines · click to toggle

Lyrics-Transformer

Audio VAE (384→16D)

Graph node2vec

Neural Collab

Contrastive SSL

Ensemble (weighted)

Filled songs = surfaced by ≥2 enabled engines. Toggle one off and the consensus shifts.

→A query song q routes to six engines; each surfaces ~3 candidates. Songs touched by multiple engines (filled) get a consensus boost in the final ranking.

§03The dataset that made it tractable

The engines run on Taylor's full discography — every era, including Vault tracks — with editorial bridges to neighbouring artists.3 The bridges are metadata, not embeddings: the six engines compute similarity within Taylor's catalogue, and the editorial layer maps outward to artists who share lyrical or sonic DNA.3. Editorial bridges to Kacey Musgraves, Maggie Rogers, Olivia Rodrigo, Gracie Abrams, Lana Del Rey, HAIM, Phoebe Bridgers, Carly Rae Jepsen, Paramore, and others. The mapping is hand-curated — each bridge has a reason.

§04Why the ensemble works

Each engine has a failure mode that the others compensate for. Lyrics-only will surface "songs that mention rain" when you wanted "songs that feel like rain." Audio-only will surface acoustic ballads when you wanted lyrical heartbreak. Collaborative filtering will surface whatever's popular this week. The ensemble is less a fusion of strengths than an averaging-out of weaknesses.

~/shubzsharma.com/final.py

Engine weights · drag to rebalance

Lyrics0.30

Audio VAE0.20

Graph0.20

Collab0.15

SSL0.15

final_score = Σ w_i · s_i + consensus_boost

Ranking · updates live

1'tis the damn season0.21

233 "GOD"0.20

3Falling0.14

4Misery Business0.14

5I Almost Do (TV)0.12

6Style0.11

Hover a song to see per-engine scores. Drag a weight to watch the ranking shift.

→Final score is a fixed-weight blend with a small consensus boost for cross-engine agreement. The weights are tuned by leave-one-out on a small held-out playlist.

// pullThe ensemble is less a fusion of strengths than an averaging-out of weaknesses.

§05What the demo actually feels like

Type a song. Six engines return their picks in parallel. The UI shows you the ensemble's top 10, but lets you toggle "show me what each engine thought" so you can see the disagreements. The disagreements are fun! The lyrics engine wants you to listen to The Manuscript; the audio VAE thinks you'd rather have You're On Your Own, Kid; the graph engine sends you somewhere else.

The site is at shubz-taylor-recommendation-engine.vercel.app. This is where I've got to so far. I'm still thinking about what a seventh engine would do — and how I can get the editorial bridges to be learned rather than hand-curated. Bring your own opinions about the Lover demos.

— written on the train back from a Tortoise live show.

shubz@torus~/writing/six-engines%ls ../related/# 1 essays

forecasting & markets

Pricing the next scarf

Five-source forecasting and an LMSR prediction market on fashion micro-trends.

9 min · ↗ read

← PREVIOUS

Cells that can't exist

The wave of advance · July '26