Model-based approach

The model based approach addresses the scalability problem seen in memory-based approaches. Compared to the user-based approach, where the recommendations came from leveraging a user's neighbors preference, the model-based approach leverages product similarity to make recommendations. The premise is that users will prefer those products similar to the ones they have already rated.

The first step in this approach is to calculate the product similarity matrix. Let us say there are a set of products: {p1,p2,...pm}. An m x m matrix is constructed to begin with. Once again Pearson coefficient or cosine similarity is used as a similarity measure. For efficiency purposes, instead of building a whole m x m matrix, a smaller m x k matrix is built, where k << m, k most similar items.

Let us write a small R snippet to explain this:

> products <- c('A','B','C')
> user.a <- c(2,0,3)
> user.b <- c(5,2,0)
> user.c <- c(3,3,0)
> ratings.matrix <- as.data.frame(list(user.a,user.b, user.c))
> names(ratings.matrix) <- c("user.a","user.b","user.c")
> rownames(ratings.matrix) <- products
> head(ratings.matrix)
user.a user.b user.c
A 2 5 3
B 0 2 3
C 3 0 0
>

We have three users, user.a, user.buser.c, and three products: A, B, C. We also have the ratings the users have provided for these products. The ratings are in a scale from 1 to 5. You should see the ratings.matrix, the final rating matrix summarizing the user product ratings.

Let us use this ratings matrix to build a product similarity matrix:

> ratings.mat <- (as.matrix(ratings.matrix))
> sim.mat <- cor(t(ratings.mat), method = "Pearson")
> sim.mat
A B C
A 1.0000000 0.5000000 -0.7559289
B 0.5000000 1.0000000 -0.9449112
C -0.7559289 -0.9449112 1.0000000
>

We have used the cor function from the base package to calculate the similarity matrix. Let us say we want to find the ratings for product B for user.a.

The most similar product to product B is A, as it has a higher score compared to C. In a real-world example, we will have multiple similar products. Now the score for B for user.a is calculated as follows:

(2 * 0.5 + 3 * -0.945) / (0.5 + (-0.945))

The predicted rating for product B for user.a is calculated using similarity measures between (B, A) and (B, C) weighted by the respective ratings given by user.a for A and C and scaling this weighted sum with the sum of similarity measures.

The similarity matrix of products are pre-computed and therefore it does not have the overhead of loading the whole user ratings matrix in memory. Amazon.com is successfully using item-based recommendations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset