"It's much easier to double your business by doubling your conversion rate than by doubling your traffic." | ||
--Jeff Eisenberg, CEO of BuyerLegends.com |
"I don't see smiles on the faces of people at Whole Foods." | ||
--Warren Buffett |
One would have to live on the dark side of the moon in order to not observe—each and every day—the results of the techniques that we are about to discuss in this chapter. If you visit www.amazon.in, watch movies on www.netflix.com, or visit any retail website, you will be exposed to terms such as related products, because you watched…, customers who bought x also bought y, or recommended for you, at every twist and turn. With large volumes of historical real-time or near real-time information, retailers utilize the algorithms discussed here to attempt to increase both your volume and the amount of purchases.
The techniques to do this can be broken down into two categories: association rules and recommendation engines. Association rule analysis is commonly referred to as market basket analysis as one is trying to understand what items are purchased together. With recommendation engines, the goal is to provide a customer with other items that they will enjoy based on how they have rated previously viewed or purchased items.
In the examples coming up, we will endeavor to explore how R can be used to develop such algorithms. We will not cover their implementation as that is outside the scope of this book. We will begin with a market basket analysis of purchasing habits at a grocery store and then dig into building a recommendation engine on website reviews.
Market basket analysis is a data mining technique that has the purpose of finding the optimal combination of products or services and allows marketers to exploit this knowledge to provide recommendations, optimize the product placement, or develop marketing programs that take advantage of cross-selling. In short, the idea is to identify what items go well together, and profit from this.
You can think of the results of the analysis as an IF-THEN statement. IF a customer buys an airplane ticket, THEN there is a 46 percent probability that they will buy a hotel room, and IF they go on to buy a hotel room, THEN there is a 33 percent probability that they will rent a car. (With all the travelling in my business, this is a never-ending annoyance for me.)
However, it is not just for sales and marketing. It is also being used in fraud detection and healthcare; for example, if a patient undergoes treatment A, then there is a 26 percent probability that they might exhibit symptom X. Before going into the details, we should have a look at some terminology as it will be used in the example:
The package in R that you can use to perform a market basket analysis is arules: Mining Association Rules and Frequent Itemsets. The package offers two different methods of finding rules. Why would one have different methods? Quite simply, if you have massive datasets, it can become computationally expensive to examine all the possible combinations of the products. The algorithms that the package supports are apriori and ECLAT. There are other algorithms to conduct a market basket analysis, but apriori is used most frequently and so we will focus on it.
With apriori, the principle is that if an itemset is frequent, then all of its subsets must also be frequent. A minimum frequency (support) is determined by the analyst prior to executing the algorithm and once established, the algorithm will run as follows:
Once you have an ordered summary of the most frequent itemsets, you can continue the analysis process by examining the confidence and lift in order to identify the associations of interest.