The memory-based approach to collaborative filtering loads the whole rating matrix into memory to provide recommendations, hence the name memory-based model. User-based filtering is the most prominent memory-based collaborative filtering model. The R snippet explained in the preceding section is the underlying principle by which memory-based methods work. As the user and product base grows, scalability is a big issue with memory-based models.
In the R snippet example, we used the ratings of user.b to fill in the missing ratings for user.a. In the real world, with thousands of users, the user-based filtering algorithm first proceeds as follows.
Let us say we have a set of users {u1,u2.....un} and a set of products {p1,p2,...pm}, and we are trying to predict the ratings for a single user, ua.
Using a similarity measure such as Pearson coefficient, or cosine distance, we try to find K neighbors for ua. For more on similarity measures, refer to http://reference.wolfram.com/language/guide/DistanceAndSimilarityMeasures.html:
- Similarity measure: A quantitative measure used to compare two vectors. They return large values for similar objects, and either zero or smaller values for dissimilar objects. Similarity measure is typically implemented as a real-valued function.
- Pearson coefficient: This measures the correlation between two variables. A value of +1 indicates that two variables are highly correlated and a value of -1 indicates negative correlation. The Pearson correlation score finds the ratio between the covariance and the standard deviation of both objects.
- Cosine Similarity: Cosine similarity between two vectors is the angle between them:
Let us say ua does not have the ratings for p5, p7, and p9. We take the average of ratings for these products from the K neighbors of ua and use that as the ratings for ua.
You can refer to Chapter A, Comprehensive Survey of Neighborhood-based Recommendation Methods in the book, Recommender System Handbook, at https://link.springer.com/chapter/10.1007/978-0-387-85820-3_4.