Let us see how we can use Neo4j as the backbone to develop recommendation systems for different data scenarios. For this purpose, we will be using the collaborative filtering approach to process the data in hand and churn out relevant results.
In order to understand how the process works, let us use a simple data set of a dating site where you can sign in and view the profile of people who you could potentially date, and you can follow or like them, or vice versa. The graphical representation of such a dataset will represent the people as nodes, and the like operations from one person to another is represented as edges between them.
As shown in the following diagram, consider a user, John, who has just registered on the website and created a profile for himself. He begins browsing through profiles of women, searching for a person that might interest him. After going through the profiles of several people, he likes the profiles of three of them – Donna, Rose, and Martha. Now, as John is trying his luck, there are other users on the site who are also actively searching. It turns out that Jack, Rory, Sean, and Harry have also liked profiles of some of the people that John has liked. So, it can be inferred that their tastes are aligned and similar. So, it is quite probable that John might like some of the other people that the guys have already liked, but whose profiles John has yet to come across. This is where the collaborative filtering comes into play, and we can suggest more options for John to view, so that he has to deal with a lesser number of irrelevant profiles.
The following diagram is an illustration of what type of relationships from our dataset are being used by the recommender:
Creating this type of a system would require complex search and set operations for a production-level implementation. This is where Neo4j comes to the rescue. What we are essentially doing in the technique above, is searching for some desired patterns in the graphical representation of our data, and analyzing the results of each sub-search to obtain a final result set. This reminds us of a Neo4j specific tool we have studied before – Cypher. It is beneficial to use Cypher in recommender systems, because of the following reasons:
The following code segment illustrates how the scenario described above can be represented using Cypher.
START Person = node(2) MATCH Person-[IS_INTERESTED_BY]->someone<-[:IS_INTERESTED_BY]-otherguy-[:IS_INTERESTED_BY]->recommendations WHERE not(Person = otherguy) RETURN count(*) AS PriorityWeight, recommendations.name AS name ORDER BY PriorityWeight DESC, name DESC;
Let us see what different parts of the preceding Cypher segment are doing in the overall scenario:
START Person = node(2)
MATCH Person-[IS_INTERESTED_BY]->someone<-[:IS_INTERESTED_BY]-otherguy-[:IS_INTERESTED_BY]->recommendations
WHERE not(Paul = otherguy)
count
method, we monitor the number of ways a result is obtained during the query execution, using the following statement:RETURN count(*) AS PriorityWeight, recommendations.name AS name
ORDER BY PriorityWeight DESC, name DESC;
Let us take an example of another social dataset to understand a more complex pattern for recommending people to date. This dataset (social2.db
in the code for this chapter) contains names of people along with their genders, dating orientations, attributes/qualities the person has, where they live, and the qualities they a looking for in potential partner.
So, let us build upon the recommendation algorithm, one step at a time:
START me=node:users_index(name = 'Albert')
MATCH me-[:lives_in]->city<-[:lives_in]-person
WHERE me.orientation = person.orientation AND ((me.gender <> person.gender AND me.orientation = "straight")) AND me-[:wants]->()<-[:has]-person AND me-[:has]->()<-[:wants]-person WITH DISTINCT city.name AS city_name, person, me MATCH me-[:wants]->attributes<-[:has]-person-[:wants]->requirements<-[:has]-me
RETURN city_name, person.name AS person_name, COLLECT(attributes.name) AS my_interests, COLLECT(requirements.name) AS their_interests, COUNT(attributes) AS matching_wants, COUNT(requirements) AS matching_has
ORDER BY (COUNT(attributes)/(1.0 / COUNT(requirements))) DESC LIMIT 10
Hence, the overall Cypher query for this recommendation algorithm will look like the following:
START me=node:users_index(name = 'Albert') MATCH me-[:lives_in]->city<-[:lives_in]-person WHERE me.orientation = person.orientation AND ((me.gender <> person.gender AND me.orientation = "straight")) AND me-[:wants]->()<-[:has]-person AND me-[:has]->()<-[:wants]-person WITH DISTINCT city.name AS city_name, person, me MATCH me-[:wants]->attributes<-[:has]-person-[:wants]->requirements<-[:has]-me RETURN city_name, person.name AS person_name, COLLECT(attributes.name) AS my_interests, COLLECT(requirements.name) AS their_interests, COUNT(attributes) AS matching_wants, COUNT(requirements) AS matching_has ORDER BY (COUNT(attributes)/(1.0 / COUNT(requirements))) DESC LIMIT 10
If you start Neo4j with the social2.db
dataset provided in the code for this chapter, you will find that the preceding query generates the following results view in the web interface:
Thus, the results are displayed in tabular format containing the people with the most matching traits with our candidate. You can also export these results from the interface to be used in a tertiary part of your web application.
The preceding algorithm is a simple representation of a recommendation system. Much more complex systems can be constructed by combining multiple such clauses together. Of course, similar operations can be performed with map data as well, for recommendations of places to visit, or with sales data for providing suggestions to customers for products they are likely to buy. However, all this is possible in a minimalistic approach with the help of graph based technologies like Neo4j and Cypher.