# What's Up, World!
Been a minute since I last checked in, and I am happy to announce I am still nerding on my app—and building cool stuff at work. Recently, I've become immersed in the wacky world of recommender systems. This was an area I held off on investigating for a long time due to my own lack of self-confidence with math. After spending a few months in the weeds on different types of recommenders, some Hugging Face courses, plus a few intro to Linear Algebra 1, 2, and Calc courses, I feel confident enough to post some dumb ideas on the internet (insert image of the Dunning-Kruger effect).
Naive Might Be Good for Me
When I first started my search, I quickly found myself pouring over research papers detailing different sophisticated approaches to recommending things—be it content-based, collaborative filtering, two-tower approaches, graph neural networks, Meta's DLRM, etc. They were laden with semi-familiar concepts mixed with some pretty foreign math (linear activation functions, eigenvector values, matrix transposition and decomposition, Frobenius norms, and Markov chains).
While I am still a ways away from possessing the math skills I want (probably not entering a Netflix contest anytime soon), I am familiar enough to put these ideas into some vibe-coded PyTorch model to try and spit out some interesting results.
Also, I want to thank all the friends who have helped me try to make sense of this exciting field—Kyle, John, Zach, Matt, and Andrew—you all have been extremely encouraging and informative. I really appreciate it. Learning math at 28 was not what I pictured myself doing ten years ago, but I'm stoked to be here doing it.
One idea I have for recommending book clubs based on user preferences centers around our graph database (chat, did I mention we have a graph DB?).
Making Sense of the Data We Might End Up With
If our product is fun enough to get recurring users (fingers crossed), our app might have some notable data to make use of when thinking in terms of building a recommender.
- Text-based updates and reviews posted inside a book club can be used to garner sentiment—via some pretrained encoder model like BERT—to determine how a user feels about a book and how they feel about another user's post.
- Likes on comments and posts provide explicit feedback.
- Awards, which share two different contexts—sarcastic and slightly negative or affirming.
- [Potentially] we could add a manual way to recommend books on the app via a new subfeature that enables users to send book recommendations via the existing notification system so a user can quickly add them to a bookshelf (making a ticket right now).
- In terms of passive insights gleaned from regular use, we know from a user's participation whether they consistently fall behind the pace of the club or stay ahead of it.
- Number of times a user in a club pressures someone else to catch up—this could be encouraging or viewed as bullying (not sure how people will feel about this feature yet).
- Number of people added to the book club by a specific user.
- Behavior from other parts of our app (namely, bookshelves).
One Possible Approach
Clustering by similarity of embedded user interaction profiles to recommend similar users to join book clubs together.
Finding similar users could be pretty doable. One approach in this article I had recommended to me stood out: Eugen Yan's article.
""" > BeeFormer combines a sentence Transformer encoder for item embeddings with an ELSA (Scalable Linear Shallow Autoencoder)-based decoder that captures patterns from user-item interactions. First, item embeddings are generated through a Transformer trained on textual data. These embeddings are then used to compute user recommendations via ELSA’s low-rank approximation of item-to-item weight. The key here is to backpropagate the gradients from the recommendation loss through the Transformer model. As a result, weight updates capture interaction patterns rather than just semantic similarity. """
In this paper, they discuss training their own model, which—I'm not sure is a route I want to explore for something simple. The less sophisticated idea I glean from this is that you could pull all user interactions/relationships and run them through some pre-trained encoder (BERT, Sentence Transformer) to create embeddings that represent a high-level overview of how a user interacts in a book club.
This embedding profile would contain different dimensions corresponding to genre affinity, reading pace, engagement style, sentiment, and tone. I'm not sure how you would weight each of these dimensions—but that’s a crucial part. You could attach this profile to a user's node in our graph DB, then, using some clustering techniques, match users based on similar profiles (users who share taste and preferences for similar book genres).
Expanding Beyond Genre-Based Matching
What if two users are reading separate books but are actually compatible? How do we infer that, even though the book or genre is different, the two users might have similar tastes?
My thought here is that maybe you can ignore matching via book ID and instead categorize your books better inside your DB, using some type of tagging to glean better results for finding similarities across multiple books. An approach like this would also probably help recommend books to users in addition to users to book clubs.
A possible drawback to this is that similarity might not be the north star for finding compatibility between prospective book club members. I don't know yet because I don't have an app with users, but what if similar interactions in a book club turn out to not mean compatible readers for a club? Maybe people actually prefer complements rather than adjacency for reading clubs.
Another potential drawback—but most definitely a success problem—is the cost of doing this at scale. How do I do this cheaply enough to make sense for a free-to-use product? Again, I hope I have this problem.
The Challenge of Onboarding New Users
If you don't have any user interaction built up—how can you still recommend things? I think this is almost a UX constraint that should dictate the create-user flow. If you are just joining the app, you should probably have to add some books to your 'want to read' shelf. That way, we can try and match based on commonalities found between books. We could look for commonalities between genre, title, and other metadata that exists on the web. Finding a way to enrich book data as much as possible to improve recommendations might be a good move here.
Final Thoughts
In the short term, thinking about the setup flow and what kind of data we need to recommend users to join clubs in the fastest possible way will probably involve a lot of clever UX that relies on book selection at the start—to immediately learn things about our users. Beyond that, experimenting early with some test data and creating naive workflows that accomplish the bigger-picture goal of recommending users to clubs will be extremely fun.
Happy Sunday.