Modeling, evaluation, and recommendations

In order to build and test our recommendation engines, we can use the same function, Recommender(), merely changing the specification for each technique. In order to see what the package can do and explore the parameters available for all six techniques, you can examine the registry. Looking at the following IBCF, we can see that the default is to find 30 neighbors using the cosine method with the centered data while the missing data is not coded as a zero:

> recommenderRegistry$get_entries(dataType =
"realRatingMatrix")

$IBCF_realRatingMatrix
Recommender method: IBCF
Description: Recommender based on item-based collaborative filtering (real data).
Parameters:
k method normalize normalize_sim_matrix alpha na_as_zero minRating
1 30 Cosine    center             FALSE   0.5      FALSE        NA

$PCA_realRatingMatrix
Recommender method: PCA
Description: Recommender based on PCA approximation (real
data).
Parameters:
categories method normalize normalize_sim_matrix alpha na_as_zero
1         20 Cosine    center              FALSE   0.5      FALSE
 minRating
1        NA

$POPULAR_realRatingMatrix
Recommender method: POPULAR
Description: Recommender based on item popularity (real
data).
Parameters: None

$RANDOM_realRatingMatrix
Recommender method: RANDOM
Description: Produce random recommendations (real ratings).
Parameters: None

$SVD_realRatingMatrix
Recommender method: SVD
Description: Recommender based on SVD approximation (real
data).
Parameters:
  categories method normalize normalize_sim_matrix alpha
treat_na
1         50 Cosine    center              FALSE   0.5   median
  minRating
1        NA

$UBCF_realRatingMatrix
Recommender method: UBCF
Description: Recommender based on user-based collaborative filtering (real data).
Parameters:
  method nn sample normalize minRating
1 cosine 25 FALSE   center        NA

Here is how you can put together the algorithms based on the train data. For simplicity, let's use the default algorithm settings. You can adjust the parameter settings by simply including your changes in the function as a list. For instance, SVD treats the missing values as the column median. If you wanted to have the missing values coded as zero, you would need to include param=list(treat_na="0"):

> ubcf = Recommender(getData(e,"train"), "UBCF")

> ibcf = Recommender(getData(e,"train"), "IBCF")

> svd = Recommender(getData(e, "train"), "SVD")

> popular = Recommender(getData(e, "train"), "POPULAR")

> pca = Recommender(getData(e, "train"), "PCA")

> random = Recommender(getData(e, "train"), "RANDOM")

Now, using the predict() and getData() functions, we will get the predicted ratings for the 15 items of the test data for each of the algorithms, as follows:

> user_pred = predict(ubcf, getData(e,"known"),type="ratings")

>  item_pred = predict(ibcf, getData(e, "known"),type="ratings")

> svd_pred = predict(svd, getData(e, "known"),type="ratings")

> pop_pred = predict(popular, getData(e, "known"),type="ratings")

> pca_pred = predict(pca, getData(e, "known"),type="ratings")

> rand_pred = predict(random, getData(e, "known"), type="ratings")

We will examine the error between the predictions and unknown portion of the test data using the calcPredictionAccuracy() function. The output will consist of RMSE, MSE, and MAE for all the methods. We'll examine UBCF by itself. After creating the objects for all six methods, we can build a table by creating an object with the rbind() function and giving names to the rows with the rownames() function:

> P1 = calcPredictionAccuracy(user_pred, getData(e,
"unknown"))

> P1
RMSE  MSE  MAE 
4.5 19.9  3.5

> P2 = calcPredictionAccuracy(item_pred, getData(e,"unknown"))

> P3 = calcPredictionAccuracy(svd_pred, getData(e, "unknown"))
> P4 = calcPredictionAccuracy(pop_pred, getData(e,"unknown"))

> P5 = calcPredictionAccuracy(pca_pred, getData(e,"unknown"))

> P6 = calcPredictionAccuracy(rand_pred, getData(e,"unknown"))

> error = rbind(P1,P2,P3,P4,P5,P6)

> rownames(error) = c("UBCF", "IBCF", "SVD", "Popular", "PCA", "Random")

> error
            RMSE      MSE      MAE
UBCF    4.467276 19.95655 3.496973
IBCF    4.651552 21.63693 3.517007
SVD     5.275496 27.83086 4.454406
Popular 5.064004 25.64414 4.233115
PCA     4.711496 22.19819 3.725162
Random  7.830454 61.31601 6.403661

We can see in the output that the user-based algorithm slightly outperforms IBCF and PCA. It is also noteworthy that a simple algorithm such as the popular-based recommendation does fairly well.

There is another way to compare methods using the evaluate() function. Making comparisons with evaluate() allows one to examine additional performance metrics as well as performance graphs. As the UBCF and IBCF algorithms performed the best, we will look at them along with the popular-based one.

The first task in this process is to create a list of the algorithms that we want to compare, as follows:

> algorithms = list(POPULAR = list(name = "POPULAR"),UBCF =list(name = "UBCF"),IBCF = list(name = "IBCF"))

> algorithms
$POPULAR
$POPULAR$name
[1] "POPULAR"

$UBCF
$UBCF$name
[1] "UBCF"

$IBCF
$IBCF$name
[1] "IBCF"

You can adjust the parameters with param=… in the list() function just as the preceding example. In the next step, you can create the results using evaluate() and also set up a comparison on a specified number of recommendations. For this example, let's compare the top 5, 10, and 15 joke recommendations:

> evlist = evaluate(e, algorithms,n=c(5,10,15))
POPULAR run 
 1  [0.05sec/1.02sec] 
UBCF run 
 1  [0.03sec/68.26sec] 
IBCF run 
 1  [2.03sec/0.86sec]3

Note that by executing the command, you will receive an output on how long it took to run the algorithm. We can now examine the performance using the avg() function:

> avg(evlist)
$POPULAR
      TP    FP     FN     TN precision    recall       TPR
5  2.092 2.908 14.193 70.807    0.4184 0.1686951 0.1686951
10 3.985 6.015 12.300 67.700    0.3985 0.2996328 0.2996328
15 5.637 9.363 10.648 64.352    0.3758 0.4111718 0.4111718
          FPR
5  0.03759113
10 0.07769088
15 0.12116708

$UBCF
      TP    FP     FN     TN precision    recall       TPR
5  2.074 2.926 14.211 70.789    0.4148 0.1604751 0.1604751
10 3.901 6.099 12.384 67.616    0.3901 0.2945067 0.2945067
15 5.472 9.528 10.813 64.187    0.3648 0.3961279 0.3961279
          FPR
5  0.03762910
10 0.07891524
15 0.12362834

$IBCF
      TP     FP     FN     TN precision     recall
5  1.010  3.990 15.275 69.725    0.2020 0.06047142
10 2.287  7.713 13.998 66.002    0.2287 0.15021068
15 3.666 11.334 12.619 62.381    0.2444 0.23966150
          TPR       FPR
5  0.06047142 0.0534247
10 0.15021068 0.1027532
15 0.23966150 0.1504704

Note that the performance metrics for POPULAR and UBCF are nearly the same. One could say that the simpler-to-implement popular-based algorithm is probably the better choice for a model selection. Indeed, what is disappointing about this whole exercise is the anemic TPR, for example, for UBCF of 15 recommendations, only an average of 5.5 were truly accurate. As mentioned, we can plot and compare the results as Receiver Operating Characteristic Curves (ROC), where you can compare TPR and FPR or as precision/recall curves, as follows:

> plot(evlist, legend="topleft", annotate=TRUE)

The following is the output of the preceding command:

Modeling, evaluation, and recommendations

To get the precision/recall curve plot you only need to specify "prec" in the plot function:

> plot(evlist, "prec", legend="bottomright", annotate=TRUE)

The output of the preceding command is as follows:

Modeling, evaluation, and recommendations

You can clearly see in the plots that the popular-based and user-based algorithms are almost identical and outperform the item-based one. The annotate=TRUE parameter provides numbers next to the point that corresponds to the number of recommendations that we called for in our evaluation.

This was simple, but what are the actual recommendations from a model for a specific individual? This is quite easy to code as well. First, let's build a user-based recommendation engine on the full dataset. Then, we will find the top five recommendations for the first two raters. We will use the Recommend() function and apply it to the whole dataset, as follows:

> R1 = Recommender(Jester5k, method="UBCF")

> R1
Recommender of type 'UBCF' for 'realRatingMatrix' 
learned using 5000 users.

Now, we just need to get the top five recommendations—in order—for the first two raters and produce them as a list:

> recommend = predict(R1, Jester5k[1:2], n=5)

> as(recommend, "list")
[[1]]
[1] "j81" "j78" "j83" "j80" "j73"

[[2]]
[1] "j96" "j87" "j89" "j76" "j93"

It is also possible to see a rater's specific rating score for each of the jokes by specifying this in the predict() syntax and then putting it in a matrix for review. Let's do this for ten individuals (raters 300 through 309) and three jokes (71 through 73):

> rating = predict(R1, Jester5k[300:309], type="ratings")

> rating
10 x 100 rating matrix of class 'realRatingMatrix' with 322
ratings.

> as(rating, "matrix")[,71:73]
             j71         j72        j73
 [1,] -0.8055227 -0.05159179 -0.3244485
 [2,]         NA          NA         NA
 [3,] -1.2472200          NA -1.5193913
 [4,]  4.0659217  4.45316186  4.0651614
 [5,]         NA          NA         NA
 [6,]  1.1233854  1.37527380         NA
 [7,]  0.4938482  0.18357168 -0.1378054
 [8,]  0.2004399  0.58525761  0.2910901
 [9,] -0.5184774  0.03067017  0.2209107
[10,]  0.1480202  0.35858842         NA

The numbers in the matrix indicate the predicted ratings for the jokes that the individual rated, while the NAs indicate those that the user did not rate.

Our final effort on this data will show how to build recommendations for those situations where the ratings are binary, that is, good or bad or 1 or 0. We will need to turn the ratings into this binary format with 5 or greater as a 1 and less than 5 as 0. This is quite easy to do with Recommenderlab using the binarize() function and specifying minRating=5:

> Jester.bin = binarize(Jester5k, minRating=5)

Now, we will need to have our data reflect the number of ratings—equal to one—in order to match what we need the algorithm to use for the training. For argument's sake, let's go with given=10. The code to create the subset of the necessary data is shown in the following lines:

> Jester.bin = Jester.bin[rowCounts(Jester.bin)>10]

> Jester.bin
3054 x 100 rating matrix of class 'binaryRatingMatrix' with 84722 ratings.

You will need to create evaluationScheme. In this instance, we will go with cross-validation. The default k-fold in the function is 10, but we can also safely go with k=5, which will reduce our computation time:

> set.seed(456)

> e.bin = evaluationScheme(Jester.bin, method="cross-validation", k=5, given=10)

For comparison purposes, the algorithms under evaluation will include random, popular and UBCF:

> algorithms.bin = list("random" = list(name="RANDOM", param=NULL),"popular" = list(name="POPULAR", param=NULL),"UBCF" = list(name="UBCF"))

It is now time to build our model, as follows:

> results.bin = evaluate(e.bin, algorithms.bin, n=c(5,10,15))
RANDOM run 
 1  [0sec/0.41sec] 
 2  [0.01sec/0.39sec] 
 3  [0sec/0.39sec] 
 4  [0sec/0.41sec] 
 5  [0sec/0.4sec] 
POPULAR run 
 1  [0.01sec/3.79sec] 
 2  [0sec/3.81sec] 
 3  [0sec/3.82sec] 
 4  [0sec/3.92sec] 
 5  [0.02sec/3.78sec] 
UBCF run 
 1  [0sec/5.94sec] 
 2  [0sec/5.92sec] 
 3  [0sec/6.05sec] 
 4  [0sec/5.86sec] 
 5  [0sec/6.09sec]

Forgoing the table of performance metrics, let's take a look at the plots:

> plot(results.bin, legend="topleft")

The output of the preceding command is as follows:

Modeling, evaluation, and recommendations
> plot(results.bin, "prec", legend="bottomright")

The output of the preceding command is as follows:

Modeling, evaluation, and recommendations

The user-based algorithm slightly outperforms the popular-based one, but you can clearly see that they are both superior to any random recommendation. In our business case, it will come down to the judgment of the decision-making team as to which algorithm to implement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset