Chapter 8. Classifications, Recommendations, and Finding Relationships
In this chapter, we will cover:
Content-based recommendations
Hierarchical clustering
Clustering a Amazon sales dataset
Collaborative filtering-based recommendations
Classification using Naive Bayes Classifier
Assigning advertisements to keywords using the Adwords balance algorithm
Introduction
This chapter discusses how we can use Hadoop for more complex use cases such as classifying a dataset, making recommendations, or finding relationships between items.
A few instances of such scenarios are as follows:
Recommending products to users either based on the similarities between the products (for example, if a user liked a book about history, he might like another book on the same subject) or based on the user behaviour patterns (for example, if two users are similar, they might like books that each to the other has read).
Clustering a data set to identify similar entities. For example, identifying users with similar interests.
Classifying data into several groups learning from the historical data.
In this recipe, we will apply these and other techniques using MapReduce. For recipes in this chapter, we will use the Amazon product co-purchasing network metadata dataset available from http://snap.stanford.edu/data/amazon-meta.html.