Chapter 8. Building Recommendation Systems with Spark and Mahout

Machine learning-based recommendation systems have become very popular and necessary in recent years in a variety of applications, such as movies, music, books, news, search queries, and products. They have brought in a dramatic change in how people buy products and find information. Recommendation systems usually recommend products to users based on their tastes and preferences. Users typically find relevant products and information that they did not know existed or did not know how to ask for.

This chapter is designed for you to understand and create recommendation systems, and will cover the following topics:

  • Building recommendation systems
  • Building a recommendation system with MLlib
  • Building a recommendation system with Mahout and Spark

Building recommendation systems

The input to a recommendation system is the feedback of likes and dislikes, and the output is recommended items based on the feedback. Some of the examples of recommendation systems, are as follows:

  • Netflix/YouTube: Movie/video recommendations
  • Amazon.com: Customers Who Bought This Item Also Bought section
  • Spotify: Music recommendations
  • Google: News recommendations

Broadly, there are two approaches to build recommendation systems: content-based filtering and collaborative filtering. Let's understand these approaches.

Content-based filtering

Content-based filtering systems build recommenders based on item attributes. Examples of item attributes in movies are the genre, actor, director, producer, and hero. A user's taste identifies the values and weights for an attribute, which are provided as an input to the recommender system. This technique is purely domain-specific and the same algorithm cannot be used to recommend other types of products.

One simple example of content-based filtering is to recommend movies in the western genre to a user who watches many cowboy movies.

Collaborative filtering

Collaborative filtering systems recommend items based on similarity measures between items or users. Items preferred by similar users will be recommended to another user. Collaborative filtering is similar to how you get recommendations from your friends. This technique is not domain-specific and does not know anything about the items, and the same algorithm can be used for any type of product such as movies, music, books, and so on.

There are two types of collaborative filtering: user-based and item-based. Let's understand these two types in detail.

User-based collaborative filtering

A user-based recommender system is based on the similarities between users.

The idea behind this algorithm is that similar users share similar preferences. For example, in the following table, User1 and User3 rated Movie1 and Movie4 with similar ratings (4 and 5). This indicates that the taste of User1 and User3 is the same. Based on this, we can recommend Movie2 to User1, which is rated 5 by User3. Similarly, Movie3 can be recommended for User3.

 

User1

User2

User3

Movie1

4

4

5

Movie2

 

4

5

Movie3

4

2

 

Movie4

4

 

5

So, we are creating a user-item matrix from the user data and then predicting the missing entries by finding similar user preferences.

Item-based collaborative filtering

Item-based recommendation is based on similarities between items. The idea behind this algorithm is that a user will have a similar preference for similar items.

The item-based algorithm works like this. For every I item that a user U has no preference for, compute the similarity between I and every other item that U has a preference for. Calculate a weighted average, where the weighted preference is the product of the similarity of item I with any other items that U has expressed a preference for, with the preference value for that item. Adding this weighted preference for all items that U has a preference for gives the weighted sum, and dividing it by the number of such items gives the weighted average of preference value P. The P value is the preference for item I for user U, and if this is above a particular threshold, we can recommend the item to U. To build an item-based recommender, we need preference data and a notion of similarity between items.

Scalability of item-based collaborative filtering systems is much better than user-based filtering, and item-based recommendation systems are most widely used. Most successful companies have more users than products (items).

It is possible to combine content-based filtering and collaborative filtering to achieve optimized results. For example, you can predict the rating with the content-based approach and collaborative filtering, and then average the values to create a hybrid prediction.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset