One of the main goals of data mining and clustering is to learn the implicit relationships in the data. The Apriori algorithm helps to do this by teasing out such relationships into an explicit set of association rules. A common example of this type of analysis is what is done by groceries stores. They analyze receipts to see which items are commonly bought together, and then they can modify the store layout and marketing to suggest the second item once you've decided to buy the first item.
In this recipe, we'll use this algorithm to extract the relationships from the mushroom dataset that we've already seen several times in this chapter.
First, we'll use the same dependencies that we did in the Loading CSV and ARFF files into Weka recipe.
We'll use only one import in our script or REPL:
(import [weka.associations Apriori])
We'll also use the mushroom dataset that we introduced in the Classifying data with decision trees recipe. We'll set the class attribute to the column indicating whether the mushroom is edible or poisonous:
(def shrooms (doto (load-arff "data/UCI/mushroom.arff") (.setClassIndex 22)))
Finally, we'll use the defanalysis
macro from the Discovering groups of data using K-Means clustering recipe.
We'll train an instance of the Apriori
class, extract the classification rules, and use them to classify the instances:
Apriori
class:(def rank-metrics {:confidence 0 :lift 1 :leverage 2 :conviction 3})
Apriori
class:(defanalysis apriori Apriori buildAssociations [["-N" rules 10] ["-T" rank-metric :confidence rank-metrics] ["-C" min-metric 0.9] ["-D" min-support-delta 0.05] [["-M" "-U"] min-support-bounds [0.1 1.0] :seq] ["-S" significance nil :not-nil] ["-I" output-itemsets false :flag-true] ["-R" remove-missing-value-columns false :flag-true] ["-V" progress false :flag-true] ["-A" mine-class-rules false :flag-true] ["-c" class-index nil :not-nil]])
(def a (apriori shrooms))
user=> (doseq [r (.. a getAssociationRules getRules)] (println (format "%s => %s %s = %.4f" (mapv str (.getPremise r)) (mapv str (.getConsequence r)) (.getPrimaryMetricName r) (.getPrimaryMetricValue r)))) ["veil-color=w"] => ["veil-type=p"] Confidence = 1.0000 ["gill-attachment=f"] => ["veil-type=p"] Confidence = 1.0000 …
The Apriori algorithm looks for items that are often associated together within a transaction. This can be used for things such as analyzing shopping patterns. In this case, we're viewing the constellation of attributes related to each mushroom as a transaction, and we're using the Apriori algorithm to see which traits are associated with which other traits.
The algorithm attempts to find the premises that imply a set of consequences. For instance, white veil colors (the premise) imply a partial veil type with a confidence of 1.0, so whenever the premise is found, the consequence is also found. A white veil color also implies a free gill attachment, but the confidence is 99 percent, so we know that these two aren't associated all of the time.
The abbreviated data dump of the preceding traits isn't particularly legible, so here's the same information as a table:
Premise |
Consequence |
Confidence |
---|---|---|
veil-color=w |
veil-type=p |
1.0000 |
gill-attachment=f |
veil-type=p |
1.0000 |
gill-attachment=f, veil-color=w |
veil-type=p |
1.0000 |
gill-attachment=f |
veil-color=w |
0.9990 |
gill-attachment=f, veil-type=p |
veil-color=w |
0.9990 |
gill-attachment=f |
veil-type=p, veil-color=w |
0.9990 |
veil-color=w |
gill-attachment=f |
0.9977 |
veil-type=p, veil-color=w |
gill-attachment=f |
0.9977 |
veil-color=w |
gill-attachment=f, veil-type=p |
0.9977 |
veil-type=p |
veil-color=w |
0.9754 |
From this, we can see that a white veil is associated with a partial veil type, a free gill attachment is associated with a partial white veil, and so on. If we want more information, we can request more rules using the rules
parameter.
The Weka documentation at http://weka.sourceforge.net/doc.dev/weka/associations/Apriori.html has more information about the Apriori
class and its options
For more about the algorithm itself, see Wikipedia's page on the Apriori algorithm at http://en.wikipedia.org/wiki/Apriori_algorithm