Recommender system

A recommender system is an original killer application which is a subclass of an information filtering system that looks to predict the rating or preference from the users that they usually provide to an item. The concept of recommender systems has become very common in recent years and has been subsequently applied in different applications. The most popular ones are probably products (for example, movies, music, books, research articles), news, search queries, social tags, and so on). Recommender systems can be typed into four categories as stated in Chapter 2, Machine Learning Best Practices. These are shown in Figure 10:

  • The collaborative filtering system: This is the accumulation of a consumer's preferences and recommendations to other users based on likeness in behavioral patterns Content-based systems: Here the supervised machine learning is used to persuade a classifier to distinguish between interesting and uninteresting items for the users
  • Hybrid recommender systems: This is a recent research and hybrid approach (that is, combining collaborative filtering and content-based filtering)
  • Knowledge-based systems: Here knowledge about users and products are used to understand what fulfils a user's requirements, using a perception tree, decision support systems, and case-based reasoning:
    Recommender system

    Figure 10: Hierarchy of the recommendation systems

From the technical viewpoint, we can further categorize them as follows:

  • The item hierarchy is the weakest one where it is naively assuming that one item is correlated to another, for example, if you buy a printer, it is more likely that you will buy the ink. Previously this approach was used by BestBuy
  • Attribute-based recommendation: Assumes that you like action movies starring Sylvester Stallone, therefore, you might be like the Rambo series. Netflix used to use this approach
  • Collaborative filtering (User-user similarity): This assumes and exemplifies those people like you who brought baby milk also bought diapers. Target use this approach
  • Collaborative filtering (Item-item similarity): This assumes and exemplifies that people who like Godfather series also like Scarface. Netflix currently uses this approach
  • Social, interest and graph-based approach: This assumes for example that your friend who likes Michel Jackson will also like Just Beat It. The tech giant like LinkedIn and Facebook use this approach
  • Model-based approach: This uses an advanced algorithm such as SVM, LDA, and SVD based on the implicit features

As shown in Figure 11, the model-based recommender system that widely used advanced algorithms such as SVM, LDA, or SVD is the most robust approach in the recommender system class:

Recommender system

Figure 11: The recommender system from the technical point of view

Collaborative filtering in Spark

As already mentioned, the collaborative filtering techniques are commonly used for recommender systems. However, Spark MLlib currently supports model-based collaborative filtering only. Here, users and products are described by a small set of latent factors. The latent factors are later used for making the prediction of the missing entries. According to the Spark API reference for the collaborative filtering on http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html: the Alternating Least Squares (ALS) (also known as non-linear least square,that is, NLS; see more at https://en.wikipedia.org/wiki/Non-linear_least_squares) algorithm is used to learn these latent factors by considering the following parameters:

  • numBlocks is the number of blocks used for the parallelized computation using the native LAPACK
  • rank is the number of latent factors during the machine learning model building
  • iterations are the number of iterations needed to gain more accurate predictions
  • lambda signifies the regularization parameter for the ALS algorithm
  • implicitPrefs specifies which feedback to be used (explicit feedback ALS variant or one adapted for implicit feedback data)
  • alpha specifies the baseline confidence in preference observations for the ALS algorithm

At first, the ALS, which is an iterative algorithm, is used to model the rating matrix as the multiplication of low-ranked users and product factors. After that, the learning task is done by using these factors by minimizing the reconstruction error of the observed ratings.

However, the unknown ratings can successively be calculated by multiplying these factors together. The approach for the move recommendation or any other recommendation based on the collaborative filtering technique used in the Spark MLlib has been proven a high performer with high prediction accuracy and is scalable for the billions of ratings on commodity clusters used by companies such as Netflix. In following this way, a company such as Netflix can recommend movies to its subscribers based on the predicted ratings. The ultimate target is to increase the sales and of course the customer satisfaction.

For brevity and page limitation, we will not show the movie recommendations using the collaborative filtering approach in this chapter. However, a step-by-step example using Spark will be shown in Chapter 9, Advanced Machine Learning with Streaming and Graph Data.

Tip

For the time being, interested readers are advised to visit the Spark website for the latest API and codes for the same at this URL: http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html, where an example has been presented to show the sample movie recommendations using the ALS algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset