Step 8 - Evaluating the model

In order to verify the quality of the model, Root Mean Squared Error (RMSE) is used to measure the difference between values predicted by a model and the values actually observed. By default, the smaller the calculated error, the better the model. In order to test the quality of the model, the test data is used (which was split in step 4).

According to many machine learning practitioners, RMSE is a good measure of accuracy, but only for comparing forecasting errors of different models for a particular variable. They say it is not fit for comparing between variables as it is scale dependent. The following line of code calculates the RMSE value for the model that was trained using the training set:

val rmseTest = computeRmse(model, testRDD, true) 
println("Test RMSE: = " + rmseTest) //Less is better

For this setting, we get this output:

Test RMSE: = 0.9019872589764073

This method computes the RMSE to evaluate the model. The lesser the RMSE, the better the model and its prediction capability. It is to be noted that computeRmse() is a UDF that goes as follows:

def computeRmse(model: MatrixFactorizationModel, data: RDD[Rating], implicitPrefs: Boolean): Double = {         val predictions: RDD[Rating] = model.predict(data.map(x => (x.user, x.product))) 
    val predictionsAndRatings = predictions.map { x => ((x.user, x.product), x.rating) }
        .join(data.map(x => ((x.user, x.product), x.rating))).values 
    if (implicitPrefs) { println("(Prediction, Rating)")                 
        println(predictionsAndRatings.take(5).mkString("n")) } 
        math.sqrt(predictionsAndRatings.map(x => (x._1 - x._2) * (x._1 - x._2)).mean()) 
    }
>>>

Finally, let's provide some movie recommendation for a specific user. Let's get the top six movie predictions for user 668:

println("Recommendations: (MovieId => Rating)") 
println("----------------------------------") 
val recommendationsUser = model.recommendProducts(668, 6) 
recommendationsUser.map(rating => (rating.product, rating.rating)).foreach(println) println("----------------------------------")
>>>

The performance of the preceding model could be increased more, we believe. However, so far, there's no model tuning facility of our knowledge available for the MLlib-based ALS algorithm.

Interested readers should refer to this URL for more on tuning ML-based ALS models: https://spark.apache.org/docs/preview/ml-collaborative-filtering.html.

Table of Contents for Step 8 - Evaluating the model

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 8 - Evaluating the model