Roposo was teeming with great content on the many faces of fashion in India. We now wanted to serve only the most relevant posts to our users right from the first time they opened Roposo and for that we needed to know their interests.
We were looking for the sweet spot in the wide spectrum of interests of a user. What better than to identify the people whose style she might like. In late 2015, we decided to revamp user on-boarding to get the seeds to derive this very crucial information.
So, our problem could now be defined like this — when a user joins Roposo, find which users’ style she might like the most.
Our design and product team decided to help us by providing data in the form of initial likes of the user. (Should I write about the bias that this leads to?)
The system that we had to now build, reminds me of the dilemmas of the Hogwarts Sorting Hat quadrupled.
This *sorting* problem is beyond the Sorting Hat
Our team decided to build the Roposo Sorting Hat using collaborative filtering.
Collaborative filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person Parvati has the same opinion as a person Padma on an issue, Parvati is more likely to have Padma’s opinion on a different issue than that of a randomly chosen person.
The above mentioned algorithm works for a bidirectional graph, for example, Facebook and LinkedIn, where two people are connected when both are interested in each other. For unidirectional graphs like Roposo and Twitter where a person Hermione showing interest in a person Voldemort does not imply that Voldemort would be interested in Hermione. To find the interests in such graphs, we have to go a level deeper.
Let’s look at that with an example. Say Hermione joined Roposo(One can dream!).
Who do you think Hermione should follow?
1. We get all the people she immediately followed while on-boarding, and the people whose posts she liked. Let this set be A. (It contains Harry, obviously!)
2. Get all the users that follow the users in set A. Let this set be B. (It contains Ginny, well…!)
3. Get the users (other than A) that the users in set B follow. Let’s call this set C. (It contains Ron, because family!)
C is the set of users that we think Hermione would most probably like. Since the set of C is likely to be huge, we sort the c’s by the number of b’s who follow them.
I hope it is clear by now that collaborative filtering is simply traversing a graph — difficult to do in a relational database. Fortunately, Roposo’s main database is a huge graph (Neo4j)!
We were so excited about finally rocking the Neo4j heavy machinery for what it is meant to do the best — complicated queries easily represented in a graph.
Let’s look at all the experiments and trials our team did to be able to do this complex process on a huge data in <200ms.