Data preparation

For this analysis, we'll only need to load two packages, as well as the Groceries dataset:

> install.packages("arules")

> install.packages("arulesViz")

> library(arules)

> data(Groceries)

> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5
slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf"
"ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44
44 44 44 44
44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6
6 6 6 6
...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables

This dataset is structured as a sparse matrix object, known as the transaction class, which we created previously.

So, once the structure is that of the class transaction, our standard exploration techniques won't work, but the arules package offers us other methods to explore the data. The best way to explore this data is with an item frequency plot using the itemFrequencyPlot() function in the arules package. You'll need to specify the transaction dataset, the number of items with the highest frequency to plot, and whether or not you want the relative or absolute frequency of the items. Let's first look at the absolute frequency and the top 10 items only:

> arules::itemFrequencyPlot(Groceries, topN = 10, type = "absolute")

The output of the preceding command is as follows:

The top item purchased was whole milk with roughly 2,500 of the 9,836 transactions in the basket. For a relative distribution of the top 15 items, let's run the following code:

> arules::itemFrequencyPlot(Groceries, topN = 15)

The following is the output of the preceding command:

Alas, here we see that beer shows up as the 13th and 15th most purchased item at this store. Just under 10 % of the transactions related to bottled beer and/or canned beer.

For this exercise, this is all we need to do; therefore, we can move right on to the modeling and evaluation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset