Building a recommendation system

Let us see how we can use Neo4j as the backbone to develop recommendation systems for different data scenarios. For this purpose, we will be using the collaborative filtering approach to process the data in hand and churn out relevant results.

In order to understand how the process works, let us use a simple data set of a dating site where you can sign in and view the profile of people who you could potentially date, and you can follow or like them, or vice versa. The graphical representation of such a dataset will represent the people as nodes, and the like operations from one person to another is represented as edges between them.

As shown in the following diagram, consider a user, John, who has just registered on the website and created a profile for himself. He begins browsing through profiles of women, searching for a person that might interest him. After going through the profiles of several people, he likes the profiles of three of them – Donna, Rose, and Martha. Now, as John is trying his luck, there are other users on the site who are also actively searching. It turns out that Jack, Rory, Sean, and Harry have also liked profiles of some of the people that John has liked. So, it can be inferred that their tastes are aligned and similar. So, it is quite probable that John might like some of the other people that the guys have already liked, but whose profiles John has yet to come across. This is where the collaborative filtering comes into play, and we can suggest more options for John to view, so that he has to deal with a lesser number of irrelevant profiles.

The following diagram is an illustration of what type of relationships from our dataset are being used by the recommender:

Building a recommendation system

Creating this type of a system would require complex search and set operations for a production-level implementation. This is where Neo4j comes to the rescue. What we are essentially doing in the technique above, is searching for some desired patterns in the graphical representation of our data, and analyzing the results of each sub-search to obtain a final result set. This reminds us of a Neo4j specific tool we have studied before – Cypher. It is beneficial to use Cypher in recommender systems, because of the following reasons:

  • It works on the principal of pattern matching, and therefore is perfect for implementing collaborative recommendation algorithms.
  • Cypher, being a declarative query language, does not need you to write code for how to match the query patterns. You will simple need to mention what to match and get results. This leads to simpler and smaller codes for creating complex recommendation systems.
  • Cypher, designed specifically for Neo4j will give optimum performance for relatively large datasets, compared to writing native code for generating recommendations.

The following code segment illustrates how the scenario described above can be represented using Cypher.

START Person = node(2)
MATCH Person-[IS_INTERESTED_BY]->someone<-[:IS_INTERESTED_BY]-otherguy-[:IS_INTERESTED_BY]->recommendations
WHERE not(Person = otherguy)
RETURN count(*) AS PriorityWeight, recommendations.name AS name
ORDER BY PriorityWeight DESC, name DESC;

Let us see what different parts of the preceding Cypher segment are doing in the overall scenario:

  1. We initially select the user who is the candidate for the recommendations using the following query:
    START Person = node(2)
  2. The main pattern that we are searching for is about finding the women that are liked by the people who co-incidentally share common likes with our recommendation candidate. The following query illustrates it:
    MATCH Person-[IS_INTERESTED_BY]->someone<-[:IS_INTERESTED_BY]-otherguy-[:IS_INTERESTED_BY]->recommendations
  3. We rule out the consideration of our candidate as a tertiary user, since by default, the candidate, John, shares the same likes as himself which would result in a redundant case. Hence the following statement:
    WHERE not(Paul = otherguy)
  4. Using the count method, we monitor the number of ways a result is obtained during the query execution, using the following statement:
    RETURN count(*) AS PriorityWeight, recommendations.name AS name
  5. Finally, we return the results in the order of relevance (using the sort method for ordering):
    ORDER BY PriorityWeight DESC, name DESC;

Let us take an example of another social dataset to understand a more complex pattern for recommending people to date. This dataset (social2.db in the code for this chapter) contains names of people along with their genders, dating orientations, attributes/qualities the person has, where they live, and the qualities they a looking for in potential partner.

Building a recommendation system

So, let us build upon the recommendation algorithm, one step at a time:

  1. First, we need the name of the person who is looking for a person to date. The following statement can be used if you know the name:
    START me=node:users_index(name = 'Albert')
  2. In order to provide recommendations, we need to consider people living in a nearby location or same town, since a person from Alaska will not prefer to date someone from California (you know how long distance dates turn out!!). So the following statement can be used to filter:
    MATCH me-[:lives_in]->city<-[:lives_in]-person
  3. From the results obtained in the preceding step, we need to match the genders of the prospective person with the candidate depending upon his/her orientation.
  4. We also need to match the qualities that our candidate is looking for and the qualities that the prospective person possesses. We can't be selfish though!
  5. We also check if the candidate possesses the qualities that the prospective person is looking for. Hence, the following statement follows:
    WHERE me.orientation = person.orientation AND
        ((me.gender <> person.gender AND me.orientation = "straight")) AND
          me-[:wants]->()<-[:has]-person AND
          me-[:has]->()<-[:wants]-person
    WITH DISTINCT city.name AS city_name, person, me
    MATCH  me-[:wants]->attributes<-[:has]-person-[:wants]->requirements<-[:has]-me
  6. To check the results obtained at this stage, you can collect the results from the preceding statement, process them to find the number of matching attributes between the candidate and the prospective person, by using the following statement:
    RETURN city_name, person.name AS person_name,
           COLLECT(attributes.name) AS my_interests,
           COLLECT(requirements.name) AS their_interests,
           COUNT(attributes) AS matching_wants,
          COUNT(requirements) AS matching_has
  7. Depending on the practical utility of the application for which the recommender operates, you can even sort the results on the basis of relevance, and display to the candidate the top results.
    ORDER BY (COUNT(attributes)/(1.0 / COUNT(requirements))) DESC
    LIMIT 10

Hence, the overall Cypher query for this recommendation algorithm will look like the following:

START me=node:users_index(name = 'Albert')
MATCH me-[:lives_in]->city<-[:lives_in]-person
WHERE me.orientation = person.orientation AND
    ((me.gender <> person.gender AND me.orientation = "straight")) AND
      me-[:wants]->()<-[:has]-person AND
      me-[:has]->()<-[:wants]-person
WITH DISTINCT city.name AS city_name, person, me
MATCH  me-[:wants]->attributes<-[:has]-person-[:wants]->requirements<-[:has]-me
RETURN city_name, person.name AS person_name,
       COLLECT(attributes.name) AS my_interests,
       COLLECT(requirements.name) AS their_interests,
       COUNT(attributes) AS matching_wants,
      COUNT(requirements) AS matching_has
ORDER BY (COUNT(attributes)/(1.0 / COUNT(requirements))) DESC
LIMIT 10

If you start Neo4j with the social2.db dataset provided in the code for this chapter, you will find that the preceding query generates the following results view in the web interface:

Building a recommendation system

Thus, the results are displayed in tabular format containing the people with the most matching traits with our candidate. You can also export these results from the interface to be used in a tertiary part of your web application.

The preceding algorithm is a simple representation of a recommendation system. Much more complex systems can be constructed by combining multiple such clauses together. Of course, similar operations can be performed with map data as well, for recommendations of places to visit, or with sales data for providing suggestions to customers for products they are likely to buy. However, all this is possible in a minimalistic approach with the help of graph based technologies like Neo4j and Cypher.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset