After over a decade of writing software professionally… sometimes things get boring. I’m thankful that my job at SBLive is actually pretty interesting, but at some level my day to day work follows a pretty consistent rhythm. Not long ago though, a coworker hit me up about a tiny project that sounded pretty fun.
I have a very random, non work related, question for you lol. I downloaded my Spotify listening history and I’m searching for a Chinese song that I found while I was living there. I have no idea what the title of the song is, but I know it’s in Mandarin. How could I take my data set and search for either Mandarin or non-English language titles? 😂
My initial response was that I bet ChatGPT could answer the question for him. I tried to just add the file into the assistants playground and ask it do the heavy lifting… but the results weren’t great.
I poked a little bit more, but at the step where I was trying to figure out how to vectorize and search across it more easily, I realized that I was drastically overcomplicating things. A quick bit of Googling later, I ended up slapping together this quick little script:
I didn’t try to optimize it at all, it’s the quickest and dirtiest solution that I could make happen. And it even works! Here’s the output:
It was a nice little reminder that sometimes it’s okay to just hack something together and call it a day.
The Robots Probably Still Win
OpenAI has made some upgrades since I initially played around with this, and it’s worth pointing out that it definitely can solve the problem without needing to lift a finger. I’m not sure whether my script or OpenAI is more correct.