Chapter 10. Market Basket Analysis and Recommendation Engines

 

"It's much easier to double your business by doubling your conversion rate than by doubling your traffic."

 
 --Jeff Eisenberg, CEO of BuyerLegends.com
 

"I don't see smiles on the faces of people at Whole Foods."

 
 --Warren Buffett

One would have to live on the dark side of the moon in order to not observe—each and every day—the results of the techniques that we are about to discuss in this chapter. If you visit www.amazon.in, watch movies on www.netflix.com, or visit any retail website, you will be exposed to terms such as related products, because you watched…, customers who bought x also bought y, or recommended for you, at every twist and turn. With large volumes of historical real-time or near real-time information, retailers utilize the algorithms discussed here to attempt to increase both your volume and the amount of purchases.

The techniques to do this can be broken down into two categories: association rules and recommendation engines. Association rule analysis is commonly referred to as market basket analysis as one is trying to understand what items are purchased together. With recommendation engines, the goal is to provide a customer with other items that they will enjoy based on how they have rated previously viewed or purchased items.

In the examples coming up, we will endeavor to explore how R can be used to develop such algorithms. We will not cover their implementation as that is outside the scope of this book. We will begin with a market basket analysis of purchasing habits at a grocery store and then dig into building a recommendation engine on website reviews.

An overview of a market basket analysis

Market basket analysis is a data mining technique that has the purpose of finding the optimal combination of products or services and allows marketers to exploit this knowledge to provide recommendations, optimize the product placement, or develop marketing programs that take advantage of cross-selling. In short, the idea is to identify what items go well together, and profit from this.

You can think of the results of the analysis as an IF-THEN statement. IF a customer buys an airplane ticket, THEN there is a 46 percent probability that they will buy a hotel room, and IF they go on to buy a hotel room, THEN there is a 33 percent probability that they will rent a car. (With all the travelling in my business, this is a never-ending annoyance for me.)

However, it is not just for sales and marketing. It is also being used in fraud detection and healthcare; for example, if a patient undergoes treatment A, then there is a 26 percent probability that they might exhibit symptom X. Before going into the details, we should have a look at some terminology as it will be used in the example:

  • Itemset: This ia a collection of one or more items in the dataset.
  • Support: This is the proportion of the transactions in the data that contain an itemset of interest.
  • Confidence: This is the conditional probability that if a person purchases or does x, they will purchase or do y; the act of doing x is referred to as the antecedent of Left-Hand Side (LHS) and y is the consequence of Right-Hand Side (RHS).
  • Lift: This is the ratio of the support of x occurring together with y divided by the probability that x and y occur if they are independent. It is the Confidence divided by the probability of x times the probability of y; for example, say that we have the probability of x and y occurring together is 10 percent and the probability of x is 20 percent and y 30 percent, then the Lift would be 10 percent (20 percent times 30 percent) or 16.67 percent.

The package in R that you can use to perform a market basket analysis is arules: Mining Association Rules and Frequent Itemsets. The package offers two different methods of finding rules. Why would one have different methods? Quite simply, if you have massive datasets, it can become computationally expensive to examine all the possible combinations of the products. The algorithms that the package supports are apriori and ECLAT. There are other algorithms to conduct a market basket analysis, but apriori is used most frequently and so we will focus on it.

With apriori, the principle is that if an itemset is frequent, then all of its subsets must also be frequent. A minimum frequency (support) is determined by the analyst prior to executing the algorithm and once established, the algorithm will run as follows:

  • Let k=1 (the number of items)
  • Generate itemsets of a length that are equal to or greater than the specified support
  • Iterate k + (1…n), pruning those that are infrequent (less than the support)
  • Stop the iteration when no new frequent itemsets are identified

Once you have an ordered summary of the most frequent itemsets, you can continue the analysis process by examining the confidence and lift in order to identify the associations of interest.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset