Modeling, evaluation, and recommendations

In order to build and test our recommendation engines, we can use the same function, Recommender(), merely changing the specification for each technique. In order to see what the package can do and explore the parameters available for all six techniques, you can examine the registry. Looking at the following IBCF, we can see that the default is to find 30 neighbors using the cosine method with the centered data while the missing data is not coded as a zero:

    > recommenderRegistry$get_entries(dataType =
"realRatingMatrix")

$ALS_realRatingMatrix
Recommender method: ALS for realRatingMatrix
Description: Recommender for explicit ratings based on latent
factors, calculated by alternating least squares algorithm.

Reference: Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, Rong
Pan (2008).
Large-Scale Parallel Collaborative Filtering for the Netflix Prize,
4th Int'l
Conf. Algorithmic Aspects in Information and Management, LNCS 5034.

Parameters:
normalize lambda n_factors n_iterations min_item_nr seed
1 NULL 0.1 10 10 1 NULL

$ALS_implicit_realRatingMatrix
Recommender method: ALS_implicit for realRatingMatrix
Description: Recommender for implicit data based on latent factors,
calculated by alternating least squares algorithm.

Reference: Yifan Hu, Yehuda Koren, Chris Volinsky (2008).
Collaborative
Filtering for Implicit Feedback Datasets, ICDM '08 Proceedings of
the 2008
Eighth IEEE International Conference on Data Mining, pages 263-272.

Parameters:
lambda alpha n_factors n_iterations min_item_nr seed
1 0.1 10 10 10 1 NULL

$IBCF_realRatingMatrix
Recommender method: IBCF for realRatingMatrix
Description: Recommender based on item-based collaborative
filtering.

Reference: NA
Parameters:
k method normalize normalize_sim_matrix alpha na_as_zero
1 30 "Cosine" "center" FALSE 0.5 FALSE

$POPULAR_realRatingMatrix
Recommender method: POPULAR for realRatingMatrix
Description: Recommender based on item popularity.
Reference: NA
Parameters:
normalize aggregationRatings aggregationPopularity
1 "center" new("standardGeneric" new("standardGeneric"

$RANDOM_realRatingMatrix
Recommender method: RANDOM for realRatingMatrix
Description: Produce random recommendations (real ratings).
Reference: NA
Parameters: None

$RERECOMMEND_realRatingMatrix
Recommender method: RERECOMMEND for realRatingMatrix
Description: Re-recommends highly rated items (real ratings).
Reference: NA
Parameters:
randomize minRating
1 1 NA

$SVD_realRatingMatrix
Recommender method: SVD for realRatingMatrix
Description: Recommender based on SVD approximation with column-mean
imputation.

Reference: NA
Parameters:
k maxiter normalize
1 10 100 "center"

$SVDF_realRatingMatrix
Recommender method: SVDF for realRatingMatrix
Description: Recommender based on Funk SVD with gradient descend.
Reference: NA
Parameters:
k gamma lambda min_epochs max_epochs min_improvement normalize
1 10 0.015 0.001 50 200 1e-06 "center"
verbose
1 FALSE

$UBCF_realRatingMatrix
Recommender method: UBCF for realRatingMatrix
Description: Recommender based on user-based collaborative
filtering.

Reference: NA
Parameters:
method nn sample normalize
1 "cosine" 25 FALSE "center"

Here is how you can put together the algorithms based on the train data. For simplicity, let's use the default algorithm settings. You can adjust the parameter settings by simply including your changes in the function as a list:

    > ubcf <- Recommender(getData(e,"train"), "UBCF")

> ibcf <- Recommender(getData(e,"train"), "IBCF")

> svd <- Recommender(getData(e, "train"), "SVD")

> popular <- Recommender(getData(e, "train"), "POPULAR")

> pca <- Recommender(getData(e, "train"), "PCA")

> random <- Recommender(getData(e, "train"), "RANDOM")

Now, using the predict() and getData() functions, we will get the predicted ratings for the 15 items of the test data for each of the algorithms, as follows:

    > user_pred <- predict(ubcf, getData(e, "known"), type = "ratings")

> item_pred <- predict(ibcf, getData(e, "known"), type = "ratings")

> svd_pred <- predict(svd, getData(e, "known"), type = "ratings")

> pop_pred <- predict(popular, getData(e, "known"), type =
"ratings")


> rand_pred <- predict(random, getData(e, "known"), type =
"ratings")

We will examine the error between the predictions and unknown portion of the test data using the calcPredictionAccuracy() function. The output will consist of RMSE, MSE, and MAE for all the methods. We'll examine UBCF by itself. After creating the objects for all five methods, we can build a table by creating an object with the rbind() function and giving names to the rows with the rownames() function:

    > P1 <- calcPredictionAccuracy(user_pred, getData(e,
"unknown"))

> P1
RMSE MSE MAE
4.5 19.9 3.5

> P2 <- calcPredictionAccuracy(item_pred, getData(e, "unknown"))

> P3 <- calcPredictionAccuracy(svd_pred, getData(e, "unknown"))

> P4 <- calcPredictionAccuracy(pop_pred, getData(e, "unknown"))

> P5 <- calcPredictionAccuracy(rand_pred, getData(e, "unknown"))

> error <- rbind(P1, P2, P3, P4, P5)

> rownames(error) <- c("UBCF", "IBCF", "SVD", "Popular", "Random")

> error
RMSE MSE MAE
UBCF 4.5 20 3.5
IBCF 4.6 22 3.5
SVD 4.6 21 3.7
Popular 4.5 20 3.5
Random 6.3 40 4.9

We can see in the output that the user-based and popular algorithms slightly outperform IBCF and SVD and all outperform random predictions.

There is another way to compare methods using the evaluate() function. Making comparisons with evaluate() allows one to examine additional performance metrics as well as performance graphs. As the UBCF and Popular algorithms performed the best, we will look at them along with IBCF.

The first task in this process is to create a list of the algorithms that we want to compare, as follows:

    > algorithms <- list(POPULAR = list(name = "POPULAR"),
UBCF =list(name = "UBCF"), IBCF = list(name = "IBCF"))


> algorithms
$POPULAR
$POPULAR$name
[1] "POPULAR"

$UBCF
$UBCF$name
[1] "UBCF"

$IBCF
$IBCF$name
[1] "IBCF"

For this example, let's compare the top 5, 10, and 15 joke recommendations:

    > evlist <- evaluate(e, algorithms, n = c(5, 10, 15))
POPULAR run
1 [0.07sec/4.7sec]
UBCF run
1 [0.04sec/8.9sec]
IBCF run
1 [0.45sec/0.32sec]3

Note that by executing the command, you will receive an output on how long it took to run the algorithm. We can now examine the performance using the avg() function:

    > set.seed(1)    

> avg(evlist)
$POPULAR
TP FP FN TN precision recall TPR FPR
5 2.07 2.93 12.9 67.1 0.414 0.182 0.182 0.0398
10 3.92 6.08 11.1 63.9 0.393 0.331 0.331 0.0828
15 5.40 9.60 9.6 60.4 0.360 0.433 0.433 0.1314

$UBCF
TP FP FN TN precision recall TPR FPR
5 2.07 2.93 12.93 67.1 0.414 0.179 0.179 0.0398
10 3.88 6.12 11.11 63.9 0.389 0.326 0.326 0.0835
15 5.41 9.59 9.59 60.4 0.360 0.427 0.427 0.1312

$IBCF
TP FP FN TN precision recall TPR FPR
5 1.02 3.98 14.0 66.0 0.205 0.0674 0.0674 0.0558
10 2.35 7.65 12.6 62.4 0.235 0.1606 0.1606 0.1069
15 3.72 11.28 11.3 58.7 0.248 0.2617 0.2617 0.1575

Note that the performance metrics for POPULAR and UBCF are nearly the same. One could say that the simpler-to-implement popular-based algorithm is probably the better choice for a model selection.  We can plot and compare the results as Receiver Operating Characteristic Curves (ROC), comparing TPR and FPR or precision/recall, as follows:

    > plot(evlist, legend = "topleft", annotate = TRUE)

The following is the output of the preceding command:

To get the precision/recall curve plot you only need to specify "prec" in the plot function:

    > plot(evlist, "prec", legend = "bottomright", annotate = TRUE)

The output of the preceding command is as follows:

You can clearly see in the plots that the popular-based and user-based algorithms are almost identical and outperform the item-based one. The annotate=TRUE parameter provides numbers next to the point that corresponds to the number of recommendations that we called for in our evaluation.

This was simple, but what are the actual recommendations from a model for a specific individual? This is quite easy to code as well. First, let's build a "popular" recommendation engine on the full dataset. Then, we will find the top five recommendations for the first two raters. We will use the Recommend() function and apply it to the whole dataset, as follows:

    > R1 <- Recommender(Jester5k, method = "POPULAR")

> R1
Recommender of type 'POPULAR' for 'realRatingMatrix'
learned using 5000 users.

Now, we just need to get the top five recommendations for the first two raters and produce them as a list:

    > recommend <- predict(R1, Jester5k[1:2], n = 5)

> as(recommend, "list")
$u2841
[1] "j89" "j72" "j76" "j88" "j83"

$u15547
[1] "j89" "j93" "j76" "j88" "j91"

It is also possible to see a rater's specific rating score for each of the jokes by specifying this in the predict() syntax and then putting it in a matrix for review. Let's do this for ten individuals (raters 300 through 309) and three jokes (71 through 73):

    > rating <- predict(R1, Jester5k[300:309], type = "ratings")

> rating
10 x 100 rating matrix of class 'realRatingMatrix' with 322
ratings.

> as(rating, "matrix")[, 71:73]
j71 j72 j73
u7628 -2.042 1.50 -0.2911
u8714 NA NA NA
u24213 -2.935 NA -1.1837
u13301 2.391 5.93 4.1419
u10959 NA NA NA
u23430 -0.432 3.11 NA
u11167 -1.718 1.82 0.0333
u4705 -1.199 2.34 0.5519
u24469 -1.583 1.96 0.1686
u13534 -1.545 2.00 NA

The numbers in the matrix indicate the predicted rating scores for the jokes that the individual rated, while the NAs indicate those that the user did not rate.

Our final effort on this data will show how to build recommendations for those situations where the ratings are binary, that is, good or bad or 1 or 0. We will need to turn the ratings into this binary format with 5 or greater as a 1 and less than 5 as 0. This is quite easy to do with Recommenderlab using the binarize() function and specifying minRating=5:

    > Jester.bin <- binarize(Jester5k, minRating = 5)

Now, we will need to have our data reflect the number of ratings equal to one in order to match what we need the algorithm to use for the training. For argument's sake, let's go with greater than 10. The code to create the subset of the necessary data is shown in the following lines:

    > Jester.bin <- Jester.bin[rowCounts(Jester.bin) > 10]

> Jester.bin
3054 x 100 rating matrix of class 'binaryRatingMatrix' with 84722
ratings.

You will need to create evaluationScheme. In this instance, we will go with cross-validation. The default k-fold in the function is 10, but we can also safely go with k=5, which will reduce our computation time:

    > set.seed(456)

> e.bin <- evaluationScheme(Jester.bin, method = "cross-
validation", k = 5, given = 10)

For comparison purposes, the algorithms under evaluation will include random, popular, and UBCF:

    > algorithms.bin <- list("random" = list(name = "RANDOM", param = 
NULL), "popular" = list(name = "POPULAR", param = NULL), "UBCF" =
list(name = "UBCF"))

It is now time to build our model, as follows:

    > results.bin <- evaluate(e.bin, algorithms.bin, n = c(5, 10, 15))
RANDOM run
1 [0sec/0.41sec]
2 [0.01sec/0.39sec]
3 [0sec/0.39sec]
4 [0sec/0.41sec]
5 [0sec/0.4sec]
POPULAR run
1 [0.01sec/3.79sec]
2 [0sec/3.81sec]
3 [0sec/3.82sec]
4 [0sec/3.92sec]
5 [0.02sec/3.78sec]
UBCF run
1 [0sec/5.94sec]
2 [0sec/5.92sec]
3 [0sec/6.05sec]
4 [0sec/5.86sec]
5 [0sec/6.09sec]

Forgoing the table of performance metrics, let's take a look at the plots:

    > plot(results.bin, legend = "topleft")

The output of the preceding command is as follows:

    > plot(results.bin, "prec", legend = "bottomright")

The output of the preceding command is as follows:

The user-based algorithm slightly outperforms the popular-based one, but you can clearly see that they are both superior to any random recommendation. In our business case, it will come down to the judgment of the decision-making team as to which algorithm to implement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset