Association rule mining

We will now be implementing the final technique in market basket analysis for finding out association rules between itemsets to detect and predict product purchase patterns which can be used for product recommendations and suggestions. We will be notably using the Apriori algorithm from the arules package which uses an implementation for generating frequent itemsets first, which we discussed earlier. Once it has the frequent itemsets, the algorithm generates necessary rules based on parameters such as support, confidence, and lift. We will also show how you can visualize and interact with these rules using the arulesViz package. The code for this implementation is in the ch3_association rule mining.R file which you can directly load and follow the book.

Loading dependencies and data

We will first load the necessary package and data dependencies. Do note that we will be using the Groceries dataset which we discussed earlier in the section dealing with advanced contingency matrices.

> ## loading package dependencies
> library(arules) # apriori algorithm
> library(arulesViz)  # visualize association rules
> 
> ## loading dataset
> data(Groceries)

Exploratory analysis

We will do some basic exploratory analysis on our dataset here, to see what kind of data we are dealing with and what products are the most popular among the customers.

> ## exploring the data
> inspect(Groceries[1:3])
  items                                                   
1 {citrus fruit,semi-finished bread,margarine,ready soups}
2 {tropical fruit,yogurt,coffee}                          
3 {whole milk}        
> # viewing the top ten purchased products                                    
> sort(itemFrequency(Groceries, type="absolute"), 
+                    decreasing = TRUE)[1:10]

Output:

Exploratory analysis

> # visualizing the top ten purchased products
> itemFrequencyPlot(Groceries,topN=10,type="absolute")

The preceding code snippet renders the following bar plot, which tells us the top ten most purchased products, which gives us a preliminary idea of what the customers buy the most when they purchase grocery items. It looks like people usually buy essential items such as milk and vegetables the most!

Exploratory analysis

Detecting and predicting shopping trends

We will be generating association rules now using the Apriori algorithm, which we talked about earlier, to detect shopping trends so that we can predict what customers might buy in the future and even recommend it to them. We will start off with a normal workflow for generating association rules:

> # normal workflow
> metric.params <- list(supp=0.001, conf=0.5)
> rules <- apriori(Groceries, parameter = metric.params)
> inspect(rules[1:5])

Output:

Detecting and predicting shopping trends

The way to interpret these rules is that you observe the items on the LHS and the items on the RHS, and conclude that if a customer has the item(s) from the LHS in his shopping cart, there is a chance of him also buying the item(s) on the RHS. This chance can be quantified using the metrics which are present in the remaining columns. We have discussed the significance of these metrics in the concepts of market basket analysis. From the previous rules, we can say that there is a 73.3% confidence that if a customer buys honey, he will also buy whole milk. From the previous rules, we see a trend that items such as honey, cocoa, pudding, and cooking chocolate all need milk as an essential ingredient, which might explain why people tend to buy that together with these products and we can recommend that to the customers. Feel free to tune the parameters for lift, support, and confidence to extract more rules from the dataset to get more and more patterns!

Often the rules generated by the Apriori algorithm have duplicate association rules which need to be removed before we examine the set of rules. You can do the same using the following utility function on the generated rules:

# pruning duplicate rules
prune.dup.rules <- function(rules){
  rule.subset.matrix <- is.subset(rules, rules)
  rule.subset.matrix[lower.tri(rule.subset.matrix, diag=T)] <- NA
  dup.rules <- colSums(rule.subset.matrix, na.rm=T) >= 1
  pruned.rules <- rules[!dup.rules]
  return(pruned.rules)
}

There are also ways to sort rules by specific metrics to see the rules with the best quality. We will look at the best rules using the previous metric parameter values sorted by the best confidence values.

# sorting rules based on metrics
rules <- sort(rules, by="confidence", decreasing=TRUE)
rules <- prune.dup.rules(rules)
inspect(rules[1:5])

Output:

Detecting and predicting shopping trends

We see itemsets in the previous rules like { rice, sugar }, which have a strong tendency to be purchased along with { whole milk }. The confidence values are pretty high (and they should be since we sorted them!) of 100% and the lift is also greater than 1, indicating a positive association between the itemsets. Do note that in large datasets, the support values may not be very high and that is perfectly normal because we are searching some specific patterns in the whole transaction dataset which may not even cover 1% of the total transactions present due to the varied type of transactions. However, it is extremely important for us to detect these patterns to make informed decisions about predicting what products might get sold together and recommending them to the customers. We will next look at another example of showing the best quality rules sorted by lift:

> rules<-sort(rules, by="lift", decreasing=TRUE)
> rules <- prune.dup.rules(rules)
> inspect(rules[1:5])

Output:

Detecting and predicting shopping trends

We see that these rules have really high lift and good confidence too making them items which customers would tend to buy together the most!

We will now look at detecting specific shopping patterns which we discussed earlier. One way to do this is to target specific items and generate association rules containing those items explicitly. The first way is to predict what items the customers might have in their shopping cart if they have bought an item on the RHS of association rules. We do this by specifying the item explicitly as shown next and analyze the transactional dataset:

> # finding itemsets which lead to buying of an item on RHS
> metric.params <- list(supp=0.001,conf=0.5, minlen=2)
> rules<-apriori(data=Groceries, parameter=metric.params, 
+                appearance = list(default="lhs",rhs="soda"),
+                control = list(verbose=F))
> rules <- prune.dup.rules(rules)
> rules<-sort(rules, decreasing=TRUE, by="confidence")
> inspect(rules[1:5])

Output:

Detecting and predicting shopping trends

It is interesting to note that people tend to buy beverages together, such as coffee, water, beer, and other miscellaneous beverages along with soda from the previous rules. Thus you can see that it is quite easy to predict when the users might buy soda using these rules and take action accordingly.

We can also predict what items the users are going to buy if they have already put some specific items in their shopping cart, by explicitly setting specific itemset values on the LHS of the association rules using the following technique:

# finding items which are bought when we have an itemset on LHS
metric.params <- list(supp=0.001, conf = 0.3, minlen=2)
rules<-apriori(data=Groceries, parameter=metric.params, 
               appearance = list(default="rhs",
                                 lhs=c("yogurt", "sugar")),
               control=list(verbose=F))
#rules <- prune.dup.rules(rules)
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])

Output:

Detecting and predicting shopping trends

You can clearly see from the previous rules that people tend to buy milk if they have yogurt and sugar in their shopping cart together or individually. Thus, by targeting specific itemsets, you can offer specific product based recommendations to the customers.

Visualizing association rules

There is an excellent package, arulesViz which provides an interactive way to visualize the association rules and interact with them. Following is a sample visualization for the preceding association rules:

> ## visualizing rules
> plot(rules, method="graph", interactive=TRUE, shading=TRUE)

The preceding code snippet generates the following visualization which aids us in understanding the association rules even better. We have kept the itemsets on the LHS on the left-side of the visualization indicated by the vertices yogurt and sugar. We can see items on the RHS which have a probability to be bought if we buy any of the items on the LHS or both together. For example, people tend to buy whole milk if they have yogurt as well as sugar in their shopping cart, or either one of them.

Visualizing association rules

This visualization generated by arulesViz is completely interactive and you can play around with the vertices and edges, and place the itemsets according to your desire to find more and more trends and patterns from various rules.

This concludes our discussion on the main techniques which are being used in market basket analysis to detect and predict trends from shopping transaction logs and take actions accordingly from the derived insights.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset