Collaborative filters

Recommendation systems and collaborative filters share a long history. From the early days of primitive recommender engines which utilized specific categorizations with hard-coded results, to current sophisticated recommender engines on various e-commerce platforms, recommender engines have made use of collaborative filters throughout. They are not only easy to understand but are equally simple to implement. Let us take this opportunity to learn more about collaborative filters before we dive into implementation details.

Note

Fun Fact

Recommender engines surely outdate any known e-commerce platform! Grundy, a virtual librarian, was developed in 1979. It was a system for recommending books to users. It modeled the users based upon certain pre-defined stereotypes and recommended books from a known list for each such category.

Core concepts and definitions

Collaborative filters (denoted as CF henceforth) and recommender engines in general use certain terms and definitions to formally define and tackle the problem.

The problem domain of recommender engines revolves around the users and the items they are interested in. A user is anybody who interacts with the system and performs certain actions on the item (say purchases or views it). Similarly, a rating defines a user's preference for an item in consideration. Generally, this trio is represented as a (user, item, rating) tuple. Since the ratings quantify a user's preference, the ratings can themselves be defined in different ways depending upon the application. Applications define ratings as integer-valued scales ranging from say 0-5, while others may define a real-valued scale. Some applications might use binary scales with values such as Like/Dislike or Purchased/Not-Purchased. Thus, each application makes use of a rating scale to suit its user's preferences.

Now that we know the key players involved, the next step is the representation of these core-concepts mathematically. A tuple of (user, item, rating) is usually represented in the form of a sparse matrix called a ratings matrix. Each user is represented by a row while the columns denote the items. Each element of this ratings matrix refers to the rating or preference of the user for an item. The ratings matrix is a sparse matrix since not all the items would be rated by every user and hence such unrated items would contain nulls or blank values. A ratings matrix using a 0-5 scale (unrated/missing ratings are denoted by ?) looks like the following matrix showing the preference of three users for different laptop models:

Core concepts and definitions

A sample ratings matrix

A recommender engine is tasked to perform two main operations: predict and recommend. The prediction operation works upon a given user and item to determine the user's likely preference for the item in consideration. For the ratings matrix (like the one shown earlier), prediction is like identification of the missing values (represented by ? in the previous example).

The recommendation operation comes after the predictions have been done. Given a user, the recommendation operation generates a list of top N items based on the user's preferences.

Note

Note that the user in consideration for the predict and recommend tasks is termed the active-user in the context of recommender engines.

The collaborative filtering algorithm

Collaborative filters are a popular set of algorithms heavily used across applications. As we know, collaborative filters utilize the behaviour of similar users to predict and recommend items for the active user. These algorithms work on a simple assumption that similar users showcase similar behaviours. More formally, the algorithm assumes that the preferences or ratings of the other users in the system can be utilized to provide reasonable predictions for the active user.

Neighbour-based collaborative filtering, also known as user-user collaborative filtering or kNN collaborative filtering, is one of the earliest and most widely used algorithms from the family of collaborative filters. The kNN collaborative filter is based on the core assumption of similar behaviour amongst users with similar preferences. This algorithm makes use of similarity measures (discussed in Chapter 2, Let's Help Machines Learn) to predict and recommend items for the active user. The algorithm follows a two-step approach of first computing the predictions followed by the recommendations. The three main components of this algorithm are discussed next.

Predictions

The first step of kNN CF is to make use of the ratings matrix (usually denoted as R) to calculate predictions. Since we are concerned about user-user CF, the neighbourhood of active user (the user in consideration), denoted as u, is to be taken into account.

Let U be the set of all available users in the system and N denote the required neighbourhood where Predictions. The algorithm then uses a similarity measure, say s, to compute the neighbours of u. Once N (the neighborhood of u) has been identified, the ratings of the neighbouring users are aggregated to compute u's preference for the current item. The most common measure to aggregate preferences is to use the weighted average of N neighboring users.

Mathematically, the active user u's predicted preference for item i, denoted as pui is given as:

Predictions

Where:

  • Predictions is the active user u's mean rating
  • Predictions is the similarity measure between the active user u and the neighbouring user Predictions

In the preceding equation, we subtract the mean of the active user's rating Predictions from the neighbouring user's mean rating to remove the rating bias of the users (some users give extremely high or low ratings and thus they may bias the overall predicted rating). A biased recommender engine might prevent better user-product matches in favour of popular or against not so popular ones. We can further improve the predictions by normalizing the user's ratings by using standard deviation to control the rating spread across the mean. To keep things simple, we will use the equation as mentioned previously. The following image depicts the nearest neighbours for an active user:

Predictions

Nearest neighbors (K=3)

The question now arises that why only weighted average was used to predict the ratings and what the optimal number of neighbours (N) is. The reason behind using weighted average is that it is one of the measures which helps in generating consistent results. Different systems over the years have used various methods, such as multivariate regressions (the BellCore system for video recommendations), unweighted averages (Ringo for music recommendations), and so on, but weighted average performs pretty well in practice.

Note

For more information, have a look at W. Hill, L. Stead, M. Rosenstein, and G. Furnas, Recommending and evaluating choices in a virtual community of use, in ACM CHI '95, pp. 194–201, ACM Press/Addison-Wesley Publishing Co., 1995.

Coming onto the second question of the optimal number of neighbours, this is something very application dependent. We saw in Chapter 2, Let's Help Machines Learn, how the number of neighbours can change the outcome of an algorithm (see K-Nearest Neighbors (KNN)), similarly the value of N can affect the outcome of a recommender engine. In general, limiting the number of neighbouring users helps in reducing the noise by removing users with low correlation to the active user. But then again, the value of N is application dependent and requires due diligence at the data scientist's end.

Recommendations

Once predictions have been done for the active user, a recommendation list can be generated by ordering the items by predicted rank. This recommendation list may be further fine-tuned by applying certain minimal thresholds and other user specific characteristics, such as preferences for color, size, price sensitivity, and so on. Thus, this step generates a list of probable items which the user is more likely to buy based on his/her personal preferences. We will cover more on this in the coming section, Building a recommender engine.

Similarity

A similarity measure is an important component of our collaborative filtering based recommender engine algorithm. There are various similarity measures available for use. The most common amongst them is the cosine similarity measure. This approach represents each user as an n dimensional vector of ratings and similarity is measured by calculating the cosine distance between two such user vectors.

Mathematically, cosine similarity is given as:

Similarity

Where, Similarity and Similarity are the L2 or Euclidean norms for each of the rating vectors.

Pearson correlation and Spearman rank correlation are a couple of statistical similarity measures which are also used widely.

Now that we understand the basics of collaborative filters and general concepts, we are ready to get our hands dirty with implementation details. Let us start with building the recommender system, brick-by-brick!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset