Predicting recommendations for movies and jokes

In this chapter, we will focus on building recommender systems using two different datasets. To do this, we shall use the recommenderlab package. This provides us with not only the algorithms to perform the recommendations, but also with the data structures to store the sparse rating matrices efficiently. The first datasets we will use contains anonymous user reviews for jokes from the Jester Online Joke recommender system.

The joke ratings fall on a continuous scale (-10 to +10). A number of datasets collected from the Jester system can be found at http://eigentaste.berkeley.edu/dataset/. We will use the datasets labeled on the website as Dataset 2+. This datasets contains ratings made by 50,692 users on 150 jokes. As is typical with a real-world application, the rating matrix is very sparse in that each user rated only a fraction of all the jokes; the minimum number of ratings made by a user is 8. We will refer to this data set as the jester datasets.

The second datasets can be found at http://grouplens.org/datasets/movielens/. This website contains data on user ratings for movies that were made on the MovieLens website at http://movielens.org. Again, there is more than one datasets on the website; we will use the one labeled MovieLens1M. This contains ratings on a five-point scale (1-5) made by 6,040 users on 3,706 movies. The minimum number of movie ratings per user is 20. We will refer to this datasets as the movie dataset.

Tip

These two datasets are actually very well-known open source datasets, to the point that the recommenderlab package itself includes smaller versions of them as part of the package. Readers who would like to skip the process of loading and preprocessing the data, or who would like to run the examples that follow on smaller datasets due to computational constraints, are encouraged to try them out using data(Jester5k) or data(MovieLense).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset