Summary

In this chapter, we performed affinity analysis in order to recommend movies based on a large set of reviewers. We did this in two stages. First, we found frequent itemsets in the data using the Apriori algorithm. Then, we created association rules from those itemsets.

The use of the Apriori algorithm was necessary due to the size of the dataset. While in Chapter 1, Getting Started With Data Mining, we used a brute-force approach, the exponential growth in the time needed to compute those rules required a smarter approach. This is a common pattern for data mining: we can solve many problems in a brute force manner, but smarter algorithms allow us to apply the concepts to larger datasets.

We performed training on a subset of our data in order to find the association rules, and then tested those rules on the rest of the data—a testing set. From what we discussed in the previous chapters, we could extend this concept to use cross-fold validation to better evaluate the rules. This would lead to a more robust evaluation of the quality of each rule.

So far, all of our datasets have been in terms of features. However, not all datasets are "pre-defined" in this way. In the next chapter, we will look at scikit-learn's transformers (they were introduced in Chapter 3, Predicting Sports Winners with Decision Trees) as a way to extract features from data. We will discuss how to implement our own transformers, extend existing ones, and concepts we can implement using them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset