User-based collaborative filtering

In UBCF, the algorithm finds missing ratings for a user by first finding a neighborhood of similar users and then aggregating the ratings of these users to form a prediction (Hahsler, 2011). The neighborhood is determined by selecting either the KNN that is the most similar to the user we are making predictions for or by some similarity measure with a minimum threshold. The two similarity measures available in recommenderlab are pearson correlation coefficient and cosine similarity. I will skip the formulas for these measures as they are readily available in the package documentation.

Once the neighborhood method is decided on, the algorithm identifies the neighbors by calculating the similarity measure between the individual of interest and their neighbors on only those items that were rated by both. Through a scoring scheme, say, a simple average, the ratings are aggregated in order to make a predicted score for the individual and item of interest.

Let's look at a simple example. In the following matrix, there are six individuals with ratings on four movies, with the exception of my rating for Mad Max. Using k=1, the nearest neighbor is Homer, with Bart a close second; even though Flanders hated the Avengers as much as I did. So, using Homer's rating for Mad Max, which is 4, the predicted rating for me would also be a 4:

There are a number of ways to weigh the data and/or control the bias. For instance, Flanders is quite likely to have lower ratings than the other users, so normalizing the data where the new rating score is equal to the user rating for an item minus the average for that user for all the items is likely to improve the rating accuracy.

The weakness of UBCF is that, to calculate the similarity measure for all the possible users, the entire database must be kept in memory, which can be quite computationally expensive and time-consuming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset