Tuner: A web app for finding people with similar music tastes

2024-12-31

Tuner is a web app I’ve been working on the side for the past 3 months. Many years ago, I wanted to use Spotify data to match people with similar music tastes, and improve discovery of new artists but using connections with other people’s music tastes. Essentially, this is how Spotify works under the hood (at least as far as I’m aware). The recommendation systems leverages the listening activity of all users in order to make suggestions.

But I wanted to add a more personal touch to these recommendations. A majority of my music library has come from artists I’ve discovered through my friends. I wondered about how this process could be done over the web, but it seemed to be out of reach.

That is, until I learned about vector databases.

After pursuing another flask project, I learned enough about web hosting and flask to finally put the pieces together

How it works

I learned about embeddings at the start of the year, but didn’t really understand then until I started getting my hands dirty in a new job.

Embeddings are a neutral network model that provide a way to quantify semantic information. First a string input is tokenized and mapped to a sequence of numbers, then this is embedded into a high-dimensional vector space. Word2vec is one of the earlier embeddings that gained a lot of attention. They now are used pretty widely in natural language processing.

For this project I used an off the shelf embedding from sentence_transformers (all-MiniLM-L6-v2) to embed music genres in a vector space.

Tuner loads the top 20 artists for a user, then records the genres each artist is tagged as. Each of these genres can be embedded into the vector space, and these vectors can be added together to form a “music taste” vector for each user. Then by calculating the similarity of vectors (by measuring the cosine of the angle between them), users with similar music tastes can be matched and ranked.

What it was like building it

It was a very fun experience building this tool. It started off as a command line interface in a basic prototype, then eventually became a flask app.

There were some interesting challenges regarding speed that had to be solved. One major speedup was achieved by pre-computing the embeddings for the genres, then looking up the vectors in a table instead of running the model on the backend.

I also had to think more about making this a more enjoyable tool to use, so I decided to use the Spotify API to recommend a playlist for the user. This playlist would be based on what artists were common for user a and user b, as well as artists that user b liked that user a didn’t have listed in their top artists.

What was the setback?

Frustratingly, Spotify announced suddenly that some of their endpoints would be closed effectively immediately. This announcement came out on Thanksgiving, and had no prior indication that it was coming. While it was frustrating for me, it was heartbreaking to see other developers stories of their tools and scripts suddenly cease to work as I trawled through the developer support forums.

Thankfully, this only broke the playlist generation part of my web app, not the core concept. I spent a week on the playlist feature, and learned a lot of basic html and JavaScript to make it work, so I was determined to salvage it.

I came up with a basic solution that would query the LastFM API for related artists and songs, but this meant I had to individually match each song to a URI on Spotify in order to create a playlist for the user. This solution took over 40 seconds to load a playlist, which is unacceptably slow for a web page.

So i got my hands dirty with multithreading, and improved the speed of this playlist generation to around 5 seconds (once I factored in rate limits for the respective APIs). I might write a more detailed post on how I used asyncio in Python to achieve this in the future.

What I learnt

I learned some incredible ideas and skills in basic web design and backend processing. I learned about the basics of OAuth and how API authentication works. JavaScript is a language I haven’t spent much time in, so the front end was particularly challenging.

This was a great opportunity to learn about embeddings, and it was a joy to bring a far off idea from several years ago to life.

What I want to add to it

When I return to the project, I want to learn about fine tuning embeddings. For now, a pertained model is sufficient, and a fine tuned model will only produce a marginal gain. But it seems like a good way to dive deeper into the world of embeddings and natural language processing

So is it a good recommendation system? Honestly, probably not better than conventional approaches. There’s a more human element to these recommendations, but the quality and range of matches depends on how many people engage with the service since I can only match people who have given permission to tuner. Initially I’ve filled the database with mock data based on popular public playlists so early users can still experience tuner as it was meant to be experienced. But as more users use tuner, I’ll be able to remove these entries. Tuner is an open source project, feel free to check out the repo and get in touch if you have any questions or comments!

Many thanks to the developers of spotipy, sentence_transformers, pinecone, flask, render, everynoise, and pylast. This fun little project would not have been possible without all the contributions made to these projects!

Reply to this post by email ↪