I’ve recently created a search and recommendation engine for RPGs on Itch.io. You can see the engine here: https://liverealproductions.com/gamesearch/
I did the because Itch’s search engine is…less than ideal and discovering new games is difficult. So, I pulled all the data I could, indexed the games for search, and built a scoring model to compare games.
This gives me a natural language search, which means you can search for “Tarot games for two players” or “Lumen games with pirates” and get results.
It also gives me the ability to compare games, which means you can see “If you like game X, maybe you should check out game Y.”
Search and discovery. I’m kinda proud of myself.
Now, I do track activity, because optimizing the discovery means know what people are doing, but I don’t (and won’t) do anything to tie that activity to a person on my end, no matter how abstracted. Privacy matters.
What’s the difference between “These games are similar to X” and “X is similar to”? I took every game I could scrape…the newest, the most popular, the highest rated…14,000 games…and compared them to each other across several data points. From there, I kept the top matches in each category, then aggregated the similarity scores to come up with the best matching games. Those games are similar to X. If you look at it from the other end of that math, you get all of the games that found X as a comparable game. For some games, the lists are nearly the same. For others, there’s almost no overlap.
Why did I do this? It’s not the money, since Itch doesn’t pay for referral links. I do have a partner account, so I am attaching my code to the link, but a) that doesn’t come with a payout and b) I can’t tell who clicked the link, just how many people clicked through to Itch. I just wanted this to exist.
Now, because I’ve been asked, here’s an ultra technical dump of what I did. Feel free to stop reading.
I scraped every publicly available data point I could and vectorized the meaningful fields. From there, I built a tf-idf matrix, which let me map the cosine similarity between each game, on those fields. I also built a weighted rating model that prevents a game with a single 5 star rating from outshining a game with hundreds of 4 star ratings. I weighted each of those data points and baked it all into a comparison score that drives discovery. A given will be compared against up to 100 other games, depending on how similar the different vector models are. I also used a machine-learning summarization algorithm to bring the often-full-page descriptions down to a few sentences.