Simple Song Finder

After over a decade of writing software professionally... sometimes things get boring. I'm thankful that my job at SBLive is actually pretty interesting, but at some level my day to day work follows a pretty consistent them. Not long ago though, a coworker hit me up about a tiny project that sounded pretty fun.

I have a very random, non work related, question for you lol. I downloaded my Spotify listening history and I’m searching for a Chinese song that I found while I was living there. I have no idea what the title of the song is, but I know it’s in Mandarin. How could I take my data set and search for either Mandarin or non-English language titles? 😂

My initial response was that I bet ChatGPT could answer the question for him. I tried to just add the file into the assistants playground and ask it do the heavy lifting... but the results weren't great.

I poked a little bit more, but at the step where I was trying to figure out how to vectorize and search across it more easily, I realized that I was drastically overcomplicating things. A quick bit of Googling later, I ended up slapping together this quick little script:

import json
from langdetect import detect

def load_and_map_music_history(file_path):
    with open(file_path, 'r') as file:
        music_data = json.load(file)

    unique_tracks = {}

    for entry in music_data:
        track_name = entry["master_metadata_track_name"]
        artist_name = entry["master_metadata_album_artist_name"]
        album_name = entry["master_metadata_album_album_name"]
        connection_country = entry["conn_country"]

        language = "unknown"
        try:
            track_language = detect(track_name)
            artist_language = detect(artist_name)
            album_language = detect(album_name)

        except:
            language = "unknown"
           
        if track_name not in unique_tracks:
            if ('zh' in track_language or 'zh' in artist_language or 'zh' in album_language):
                unique_tracks[track_name] = {
                    "artist_name": artist_name,
                    "album_name": album_name
                }

    return unique_tracks

track_info_dict = load_and_map_music_history('music_history.json')

for track, info in track_info_dict.items():
    print(f"{track} by {info['artist_name']} from the album {info['album_name']}")

I didn't try to optimize it at all, it's the quickest and dirtiest solution that I could make happen. And it even works! Here's the output:

Why Nobody Fights by Hua Chen Yu from the album 卡西莫多的禮物
二愣子 by Jordan Chan from the album 抱一抱
經典 (feat. 懂伯 & DJ Premier) by 大支 from the album 不聽
還是覺得你最好 b

It was a nice little reminder that sometimes it's okay to just hack something together and call it a day.

The Robots Probably Still Win

OpenAI has made some upgrades since I initially played around with this, and it's worth pointing out that it definitely can solve the problem without needing to lift a finger. I'm not sure whether my script or OpenAI is more correct.