Modeling, evaluation, and recommendations

In order to build and test our recommendation engines, we can use the same function, Recommender(), merely changing the specification for each technique. In order to see what the package can do and explore the parameters available for all six techniques, you can examine the registry. Looking at the following IBCF, we can see that the default is to find 30 neighbors using the cosine method with the centered data while the missing data is not coded as a zero:

    > recommenderRegistry$get_entries(dataType =
    "realRatingMatrix")
    
    $ALS_realRatingMatrix
    Recommender method: ALS for realRatingMatrix
    Description: Recommender for explicit ratings based on latent 
      factors, calculated by alternating least squares algorithm.
    Reference: Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, Rong 
      Pan (2008).
    Large-Scale Parallel Collaborative Filtering for the Netflix Prize, 
      4th Int'l   
    Conf. Algorithmic Aspects in Information and Management, LNCS 5034.
    Parameters:
    normalize lambda n_factors n_iterations min_item_nr seed
    1 NULL 0.1 10 10 1 NULL

    $ALS_implicit_realRatingMatrix
    Recommender method: ALS_implicit for realRatingMatrix
    Description: Recommender for implicit data based on latent factors, 
    calculated by alternating least squares algorithm.
    Reference: Yifan Hu, Yehuda Koren, Chris Volinsky (2008). 
      Collaborative
    Filtering for Implicit Feedback Datasets, ICDM '08 Proceedings of 
      the 2008 
    Eighth IEEE International Conference on Data Mining, pages 263-272.
    Parameters:
    lambda alpha n_factors n_iterations min_item_nr seed
    1 0.1 10 10 10 1 NULL

    $IBCF_realRatingMatrix
    Recommender method: IBCF for realRatingMatrix
    Description: Recommender based on item-based collaborative 
      filtering.
    Reference: NA
    Parameters:
    k method normalize normalize_sim_matrix alpha na_as_zero
    1 30 "Cosine" "center" FALSE 0.5 FALSE

    $POPULAR_realRatingMatrix
    Recommender method: POPULAR for realRatingMatrix
    Description: Recommender based on item popularity.
    Reference: NA
    Parameters:
     normalize aggregationRatings aggregationPopularity
     1 "center" new("standardGeneric" new("standardGeneric"

   $RANDOM_realRatingMatrix
   Recommender method: RANDOM for realRatingMatrix
   Description: Produce random recommendations (real ratings).
   Reference: NA
   Parameters: None

   $RERECOMMEND_realRatingMatrix
   Recommender method: RERECOMMEND for realRatingMatrix
   Description: Re-recommends highly rated items (real ratings).
   Reference: NA
   Parameters:
     randomize minRating
   1 1 NA

   $SVD_realRatingMatrix
   Recommender method: SVD for realRatingMatrix
   Description: Recommender based on SVD approximation with column-mean 
   imputation.
   Reference: NA
   Parameters:
    k maxiter normalize
    1 10 100 "center"

   $SVDF_realRatingMatrix
   Recommender method: SVDF for realRatingMatrix
   Description: Recommender based on Funk SVD with gradient descend.
   Reference: NA
   Parameters:
    k gamma lambda min_epochs max_epochs min_improvement normalize
    1 10 0.015 0.001 50 200 1e-06 "center"
   verbose
   1 FALSE

   $UBCF_realRatingMatrix
   Recommender method: UBCF for realRatingMatrix
   Description: Recommender based on user-based collaborative 
     filtering.
   Reference: NA
   Parameters:
    method nn sample normalize
    1 "cosine" 25 FALSE "center"

Here is how you can put together the algorithms based on the train data. For simplicity, let's use the default algorithm settings. You can adjust the parameter settings by simply including your changes in the function as a list:

    > ubcf <- Recommender(getData(e,"train"), "UBCF")
    
    > ibcf <- Recommender(getData(e,"train"), "IBCF")
    
    > svd <- Recommender(getData(e, "train"), "SVD")
    
    > popular <- Recommender(getData(e, "train"), "POPULAR")
    
    > pca <- Recommender(getData(e, "train"), "PCA")
    
    > random <- Recommender(getData(e, "train"), "RANDOM")

Now, using the predict() and getData() functions, we will get the predicted ratings for the 15 items of the test data for each of the algorithms, as follows:

    > user_pred <- predict(ubcf, getData(e, "known"), type = "ratings")
    
    > item_pred <- predict(ibcf, getData(e, "known"), type = "ratings")
    
    > svd_pred <- predict(svd, getData(e, "known"), type = "ratings")
    
    > pop_pred <- predict(popular, getData(e, "known"), type = 
       "ratings")
    
    > rand_pred <- predict(random, getData(e, "known"), type = 
       "ratings")

We will examine the error between the predictions and unknown portion of the test data using the calcPredictionAccuracy() function. The output will consist of RMSE, MSE, and MAE for all the methods. We'll examine UBCF by itself. After creating the objects for all five methods, we can build a table by creating an object with the rbind() function and giving names to the rows with the rownames() function:

    > P1 <- calcPredictionAccuracy(user_pred, getData(e,
    "unknown"))
    
    > P1
    RMSE  MSE  MAE 
    4.5 19.9  3.5
    
    > P2 <- calcPredictionAccuracy(item_pred, getData(e, "unknown"))
    
    > P3 <- calcPredictionAccuracy(svd_pred, getData(e, "unknown"))

    > P4 <- calcPredictionAccuracy(pop_pred, getData(e, "unknown"))
    
    > P5 <- calcPredictionAccuracy(rand_pred, getData(e, "unknown"))
    
    > error <- rbind(P1, P2, P3, P4, P5)
    
    > rownames(error) <- c("UBCF", "IBCF", "SVD", "Popular", "Random")
    
    > error
            RMSE MSE  MAE
    UBCF     4.5  20  3.5
    IBCF     4.6  22  3.5
    SVD      4.6  21  3.7
    Popular  4.5  20  3.5
    Random   6.3  40  4.9

We can see in the output that the user-based and popular algorithms slightly outperform IBCF and SVD and all outperform random predictions.

There is another way to compare methods using the evaluate() function. Making comparisons with evaluate() allows one to examine additional performance metrics as well as performance graphs. As the UBCF and Popular algorithms performed the best, we will look at them along with IBCF.

The first task in this process is to create a list of the algorithms that we want to compare, as follows:

    > algorithms <- list(POPULAR = list(name = "POPULAR"),
    UBCF =list(name = "UBCF"), IBCF = list(name = "IBCF"))
    
    > algorithms
    $POPULAR
    $POPULAR$name
    [1] "POPULAR"
    
    $UBCF
    $UBCF$name
    [1] "UBCF"
    
    $IBCF
    $IBCF$name
    [1] "IBCF"

For this example, let's compare the top 5, 10, and 15 joke recommendations:

    > evlist <- evaluate(e, algorithms, n = c(5, 10, 15))
    POPULAR run 
     1  [0.07sec/4.7sec] 
    UBCF run 
     1  [0.04sec/8.9sec] 
    IBCF run 
     1  [0.45sec/0.32sec]3

Note that by executing the command, you will receive an output on how long it took to run the algorithm. We can now examine the performance using the avg() function:

    > set.seed(1)    

    > avg(evlist)
    $POPULAR
         TP    FP    FN    TN   precision  recall   TPR    FPR
    5  2.07  2.93  12.9  67.1       0.414   0.182 0.182 0.0398
    10 3.92  6.08  11.1  63.9       0.393   0.331 0.331 0.0828
    15 5.40  9.60   9.6  60.4       0.360   0.433 0.433 0.1314

    $UBCF
          TP    FP    FN    TN   precision   recall   TPR    FPR
    5   2.07  2.93  12.93  67.1      0.414    0.179 0.179 0.0398
    10  3.88  6.12  11.11  63.9      0.389    0.326 0.326 0.0835
    15  5.41  9.59   9.59  60.4      0.360    0.427 0.427 0.1312

    $IBCF
          TP    FP    FN    TN    precision   recall    TPR   FPR
    5   1.02  3.98  14.0  66.0        0.205   0.0674 0.0674 0.0558
    10  2.35  7.65  12.6  62.4        0.235   0.1606 0.1606 0.1069
    15  3.72 11.28  11.3  58.7        0.248   0.2617 0.2617 0.1575

Note that the performance metrics for POPULAR and UBCF are nearly the same. One could say that the simpler-to-implement popular-based algorithm is probably the better choice for a model selection. We can plot and compare the results as Receiver Operating Characteristic Curves (ROC), comparing TPR and FPR or precision/recall, as follows:

    > plot(evlist, legend = "topleft", annotate = TRUE)

The following is the output of the preceding command:

To get the precision/recall curve plot you only need to specify "prec" in the plot function:

    > plot(evlist, "prec", legend = "bottomright", annotate = TRUE)

The output of the preceding command is as follows:

You can clearly see in the plots that the popular-based and user-based algorithms are almost identical and outperform the item-based one. The annotate=TRUE parameter provides numbers next to the point that corresponds to the number of recommendations that we called for in our evaluation.

This was simple, but what are the actual recommendations from a model for a specific individual? This is quite easy to code as well. First, let's build a "popular" recommendation engine on the full dataset. Then, we will find the top five recommendations for the first two raters. We will use the Recommend() function and apply it to the whole dataset, as follows:

    > R1 <- Recommender(Jester5k, method = "POPULAR")
    
    > R1
    Recommender of type 'POPULAR' for 'realRatingMatrix' 
    learned using 5000 users.

Now, we just need to get the top five recommendations for the first two raters and produce them as a list:

    > recommend <- predict(R1, Jester5k[1:2], n = 5)
    
    > as(recommend, "list")
    $u2841
    [1] "j89" "j72" "j76" "j88" "j83"

    $u15547
    [1] "j89" "j93" "j76" "j88" "j91"

It is also possible to see a rater's specific rating score for each of the jokes by specifying this in the predict() syntax and then putting it in a matrix for review. Let's do this for ten individuals (raters 300 through 309) and three jokes (71 through 73):

    > rating <- predict(R1, Jester5k[300:309], type = "ratings")
    
    > rating
    10 x 100 rating matrix of class 'realRatingMatrix' with 322
    ratings.
    
    > as(rating, "matrix")[, 71:73]
              j71  j72     j73
    u7628  -2.042 1.50 -0.2911
    u8714      NA   NA      NA
    u24213 -2.935   NA -1.1837
    u13301  2.391 5.93  4.1419
    u10959     NA   NA      NA
    u23430 -0.432 3.11      NA
    u11167 -1.718 1.82  0.0333
    u4705  -1.199 2.34  0.5519
    u24469 -1.583 1.96  0.1686
    u13534 -1.545 2.00      NA

The numbers in the matrix indicate the predicted rating scores for the jokes that the individual rated, while the NAs indicate those that the user did not rate.

Our final effort on this data will show how to build recommendations for those situations where the ratings are binary, that is, good or bad or 1 or 0. We will need to turn the ratings into this binary format with 5 or greater as a 1 and less than 5 as 0. This is quite easy to do with Recommenderlab using the binarize() function and specifying minRating=5:

    > Jester.bin <- binarize(Jester5k, minRating = 5)

Now, we will need to have our data reflect the number of ratings equal to one in order to match what we need the algorithm to use for the training. For argument's sake, let's go with greater than 10. The code to create the subset of the necessary data is shown in the following lines:

    > Jester.bin <- Jester.bin[rowCounts(Jester.bin) > 10]
    
    > Jester.bin
    3054 x 100 rating matrix of class 'binaryRatingMatrix' with 84722 
      ratings.

You will need to create evaluationScheme. In this instance, we will go with cross-validation. The default k-fold in the function is 10, but we can also safely go with k=5, which will reduce our computation time:

    > set.seed(456)
    
    > e.bin <- evaluationScheme(Jester.bin, method = "cross-
      validation", k = 5, given = 10)

For comparison purposes, the algorithms under evaluation will include random, popular, and UBCF:

    > algorithms.bin <- list("random" = list(name = "RANDOM", param = 
      NULL), "popular" = list(name = "POPULAR", param = NULL), "UBCF" = 
        list(name = "UBCF"))

It is now time to build our model, as follows:

    > results.bin <- evaluate(e.bin, algorithms.bin, n = c(5, 10, 15))
    RANDOM run 
     1  [0sec/0.41sec] 
     2  [0.01sec/0.39sec] 
     3  [0sec/0.39sec] 
     4  [0sec/0.41sec] 
     5  [0sec/0.4sec] 
    POPULAR run 
     1  [0.01sec/3.79sec] 
     2  [0sec/3.81sec] 
     3  [0sec/3.82sec] 
     4  [0sec/3.92sec] 
     5  [0.02sec/3.78sec] 
    UBCF run 
     1  [0sec/5.94sec] 
     2  [0sec/5.92sec] 
     3  [0sec/6.05sec] 
     4  [0sec/5.86sec] 
     5  [0sec/6.09sec]

Forgoing the table of performance metrics, let's take a look at the plots:

    > plot(results.bin, legend = "topleft")

The output of the preceding command is as follows:

    > plot(results.bin, "prec", legend = "bottomright")

The output of the preceding command is as follows:

The user-based algorithm slightly outperforms the popular-based one, but you can clearly see that they are both superior to any random recommendation. In our business case, it will come down to the judgment of the decision-making team as to which algorithm to implement.

Table of Contents for Modeling, evaluation, and recommendations

Create new playlist

Sign In

Sign Up

Table of Contents for
Modeling, evaluation, and recommendations