Image shown hereAssociation Analysis Platform Overview
The Association Analysis platform identifies connections among groups of items in an independent event or transaction. In association analysis, an item is the basic object of interest. For example, an item could be a product, a web page, or a service. An item set is a list of one or more items.
The relationship between two item sets is defined by an association rule. An association rule consists of a condition item set and a consequent item set. Antecedents are the individual items in the condition item set. Association analysis identifies association rules, which predict that a consequent item set will be in a transaction, given that the condition item set is already in the transaction. Some association rules are stronger, and therefore more useful, than others. The following three performance measures describe the strength of an association rule:
Support is the proportion of transactions in which an item set appears. A high value for support indicates that the item set occurs frequently.
Confidence is the proportion of transactions that contain the consequent item set, given that the condition item set is in the transaction. Confidence measures the strength of implication, or the predictive power, of an association rule.
Lift is the ratio of an association rule’s confidence to its expected confidence, assuming that the condition and consequent item sets appear in transactions independently. Lift measures how much the consequent item set depends on the presence of the condition item set. The minimum value for lift is 0.
A lift ratio less than 1 indicates that the condition and consequent repel each other, because they occur together less frequently than one would expect by chance alone.
A lift ratio close to 1 indicates that the consequent occurs at the same rate in transactions that contain the condition as one would expect from chance alone.
A lift ratio greater than 1 indicates that the consequent item set has an affinity for the condition item set. The consequent item set occurs more often with the condition item set than one would expect by chance alone.
For more information about these performance measures, see “Association Analysis Performance Measures”.
The Association Analysis platform also enables you to perform singular value decomposition. Singular value decomposition (SVD) groups similar transactions and also groups similar items using a matrix reducing methodology that is different from what is used in association analysis. Use the SVD methodology to gain insights that complement what you learn from association analysis.
For more information about association analysis, see Hastie et al. (2009) and Shmueli et al. (2010). For more information about singular value decomposition, see Jolliffe (2002).
Image shown hereExample of the Association Analysis Platform
This example uses the Grocery Purchases.jmp sample data table, which contains transactional data reported by a grocery store. The data table lists the items purchased by 1001 customers, each assigned a unique customer ID. You want to explore the associations among items in order to identify patterns in consumer behavior.
1. Select Help > Sample Data Library and open Grocery Purchases.jmp.
2. Select Analyze > Screening > Association Analysis.
3. Select Product and click Item.
4. Select Customer ID and click ID.
5. Click OK.
Figure 20.2 Association Analysis Report
Association Analysis Report
The fourth entry in the Rules report table indicates that 58% of customers who bought an avocado also bought an artichoke. The value of Lift is 1.908, indicating that there is a likely dependency. You want to verify that avocados and artichokes occur in a significant portion of transactions.
6. Click the disclosure icon next to Frequent Item Sets.
Figure 20.3 Frequent Item Sets Report
Frequent Item Sets Report
The Frequent Item Sets report shows that 36% of customers purchased avocados. The Rules report in Figure 20.2 shows that 58% of these customers also bought artichokes. Because of the large proportion of customers who follow this behavior, the grocery store management might use this information to strategically locate avocados and artichokes.
You also decide to look at the association rules with the highest lift.
7. Right-click in the Rules report table and select Sort By Column.
The Select Columns window appears.
8. Select Lift and click OK.
The Rules table is sorted by decreasing values of lift. Notice that the second association rule has a lift of 6.912 and 97% confidence. You want to verify that both the condition set, {Coke, Heineken, sardines}, and the consequent item set, {chicken, ice cream}, have adequate support.
9. Right-click in the Frequent Item Sets report and select Sort By Column.
The Select Columns window appears.
10. Select Item Set and the check the ascending order option.
11. Click OK.
The Frequent Item Sets table is sorted alphabetically by item set. Scroll through the list to see that the condition item set, {Coke, Heineken, sardines}, has 12% support and that the consequent item set, {chicken, ice cream}, has 14% support. This association rule has high lift, but represents fewer transactions than the first association rule that you examined.
Image shown hereLaunch the Association Analysis Platform
Launch the Association Analysis platform by selecting Analyze > Screening > Association Analysis.
Figure 20.4 Association Analysis Launch Window
Association Analysis Launch Window
Item
The categorical column that contains the item data to be analyzed.
ID
The column that identifies the transaction that an item belongs to.
By
Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.
Minimum Support
Specifies a minimum value for the proportion of occurrences of an item set. This value must be between 0 and 1. Only item sets with support equal to or exceeding this value are considered in the analysis.
Minimum Confidence
Specifies a minimum value for the proportion of occurrences that a consequent item set occurs within transactions that contain the conditional item set. This value must be between 0 and 1. Only association rules with confidence equal to or exceeding this value appear in the report.
Minimum Lift
Specifies a minimum dependency ratio. Lift values must be 0 or greater. Only association rules with lift equal to or exceeding this value appear in the report.
Maximum Antecedents
Specifies the maximum number of items in the condition item set. Association rules with more than this number of items in the condition set are not considered in the analysis.
Maximum Rule Size
Specifies the maximum number of items that appear in the union of the condition and consequent item sets. Association rules with more than this combined number of items are not considered in the analysis.
Note: You can use the minimum support, maximum antecedent, and maximum rule size options in the launch window to reduce computational time for large data sets. For more information about these measures, see “Statistical Details for the Association Analysis Platform”.
Image shown hereThe Association Analysis Report
By default, the Association Analysis report contains the following reports:
Tip: To order the contents of a table in a report by any of its columns, right-click in the table and select Sort by Column.
Image shown hereFrequent Item Sets
The Frequent Item Sets report lists item sets in decreasing order of support. The listed item sets meet the Minimum Support value that you specified in the launch window. Each item set is considered as a conditional and as a consequent item set to form association rules. The table contains the following columns:
Item Set
The item sets that are considered as conditional or consequent sets for the association rules.
Support
The proportion of transactions in which all of the items in the Item Set occur.
N Items
The number of items in the Item Set.
Image shown hereRules
The Rules report shows a table of association rules that are sorted in increasing order of number of items in the condition item set. The rules are further sorted alphabetically by the items contained in the union of the condition and consequent item sets. Only association rules that meet the Minimum Support, Minimum Confidence, Minimum Lift, Maximum Antecedents, and Maximum Rule Size requirements that you specified in the launch window appear in this report.
The Rules report table contains the following columns:
Rule
The association rules formed by combining Condition and Consequent item sets.
Condition
The item set that is thought to influence the presence of a Consequent item set within transactions.
Consequent
The item set whose presence is thought to be influenced by the presence of a Condition item set.
Confidence
The proportion of transactions that contain the Consequent item set, given that the condition item set is in the transaction. Confidence measures the strength of implication, or the predictive power, of an association rule.
Lift
The ratio of an association rule’s confidence to its expected confidence, assuming that the condition and consequent item sets appear in transactions independently. Lift measures how much the Consequent item set depends on the presence of the Condition item set. The minimum value for lift is 0.
A lift ratio less than 1 indicates that the Condition and Consequent item sets repel each other, because they occur together less frequently than one would expect by chance alone.
A lift ratio close to 1 indicates that the Consequent item set occurs at the same rate in transactions that contain the Condition item set as one would expect from chance alone.
A lift ratio greater than 1 indicates that the Consequent item set has an affinity for the Condition item set. The Consequent item set occurs more often with the Condition item set than one would expect by chance alone.
Image shown hereAssociation Analysis Platform Options
The Association Analysis red triangle menu contains the following options:
Transaction Listing
Shows or hides a table listing each Transaction ID value and the items included in that transaction. The table is sorted by the Transaction ID column.
Frequent Item Sets
Shows or hides a list of item sets whose support exceeds the Minimum Support value specified in the launch window. See “Frequent Item Sets” for more information.
Rules
Shows or hides a table of association rules that meet the Minimum Support, Minimum Confidence, Minimum Lift, Maximum Antecedents, and Maximum Rule Size requirements specified in the launch window. See “Rules” for more information.
SVD
Shows or hides scatterplots of the first two singular vectors for transactions and for items, calculated by singular value decomposition on the incidence matrix for the items. The report also contains a table of singular values sorted in descending order. The Percent and Cum Percent columns show the additional and cumulative variability in the data explained by the corresponding singular value. The bar chart shows the Percent variation explained by each singular value. For more information, see “SVD”.
Rotated SVD
(Available only if SVD is selected.) Shows or hides the Topic Items and Topic Scores reports. This option performs a varimax rotated singular value decomposition of the transaction item matrix to produce groups of similar transactions called topics. See “Rotated SVD”.
Save Transaction SVD
Creates a data table that contains a number of singular vectors that you specify for each transaction. These are the left singular values in the transaction item matrix. See “Singular Value Decomposition”.
Save Item SVD
Creates a data table that contains a number of singular vectors that you specify for each item. These are the right singular values in the transaction item matrix. See “Singular Value Decomposition”.
See the JMP Reports chapter in the Using JMP book for more information about the following options:
Local Data Filter
Shows or hides the local data filter that enables you to filter the data used in a specific report.
Redo
Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.
Save Script
Contains options that enable you to save a script that reproduces the report to several destinations.
Save By-Group Script
Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.
Image shown hereSVD
Singular value decomposition (SVD) complements association analysis by providing another method to identify items that have an affinity for each other. Singular value decomposition of the transaction item matrix reduces the matrix to a manageable number of dimensions, thereby enabling you to group similar transactions and similar items.
Image shown hereTransaction Item Matrix
The transaction item matrix is a matrix for which each row corresponds to a transaction each column corresponds to an item. The entries of the matrix are zeros and ones. If an item occurs in a transaction, the corresponding row and column entry is one. Otherwise, the row and column entry is zero. Because the transaction item matrix usually contains more values of zero than one, it is called a sparse matrix.
Image shown hereSingular Value Decomposition
The singular value decomposition approximates the transaction item matrix using three matrices: U, S, and V‘. The relationship between these matrices is defined as follows:
Transaction Item Matrix U * S * V‘
Define nTransactions as the number of transactions (rows) in the transaction item matrix, and nItems as the number of items (columns) in the transaction item matrix, and nVec as the specified number of singular vectors. Note that nVec must be less than or equal to min(nTransactions, nItems). It follows that U is an nTransactions by nVec matrix. S is a diagonal matrix of dimension nVec. The diagonal entries in S are the singular values in the SVD. V‘ is an nVec by nTransactions matrix. The rows in V‘ are the singular vectors.
The singular vectors capture connections among different items with similar functions or topic areas. If three items tend to appear in the same transactions, the SVD is likely to produce a singular vector in V‘ with large values for those three items. The U singular vectors represent the transactions projected into this new item space.
The SVD also captures indirect connections. If two items never appear together in the same transaction, but they generally appear in transactions with another third item, the SVD is able to capture some of that connection. If two transactions have no items in common but contain items that are connected in the dimension-reduced space, they map to similar vectors in the SVD plots.
The SVD transforms transaction data into a fixed-dimensional vector space, making it amenable to clustering, classification, and regression techniques. The Save options enable you to export this vector space to be analyzed in other JMP platforms.
Image shown hereSVD Report
Image shown hereSVD Plots
The SVD Plots report shows scatterplots of the first two singular vectors for both the transaction and the item data.
Tip: To see the transaction or item that a point represents, place your cursor over the point. To add the label to the plot, select the point, right-click in the plot, and select Row Label.
The Transaction SVD plot contains a point for each transaction. For a given transaction, the point that is plotted is defined by the transaction’s values on the first two singular vectors in U. In the Transaction SVD plot, points that are visibly grouped together indicate transactions with a similar composition.
The Item SVD plot contains a point for each item. For a given item, the point that is plotted is defined by the item’s values on the first two singular vectors in V. In the Item SVD plot, items that are visibly grouped together indicate items that have similar functions or topic areas.
Caution: The first two singular vectors might not adequately capture the structure of your data. The “Singular Values” report shows how much variability is explained by the singular vectors.
Image shown hereSingular Values
The kth row in the Singular Values table shows the additional and cumulative percent of variability explained by using the kth singular value or singular vector column.
Image shown hereRotated SVD
(Available only when SVD is selected from the red triangle menu next to Association Analysis.) The Rotated SVD option performs a varimax rotation on the singular value decomposition (SVD) of the transaction item matrix. See “Transaction Item Matrix”. You must specify a number of rotated singular vectors, which corresponds to the number of topics that are created by the platform.
Topics are groups of transactions that are grouped based on a primary item indicator, as well as secondary item indicators. For each topic, every item has a weight that influences a transaction’s membership in the topic. The cumulative sum of the item weights for all of the items that are present in a transaction is called the topic score. Topic scores reflect the strength of a transaction’s membership for a topic.
The varimax rotation rotates the singular vectors to more closely align them with the coordinate axes. This rotation helps facilitate interpretation by resulting in high loadings on a few axes and small loadings on the others. The loadings are given in the Rotated V Matrix and Rotated U Matrix reports.
Image shown hereTopic Items
(Available only when Rotated SVD is selected from the red triangle menu next to Association Analysis.) The Topic Items report shows a number of transaction groups, called topics. The resulting report shows the strongest indicators for each topic sorted in descending order by the absolute value of the score. The items with the largest absolute scores represent the thematic composition of a topic. The topic items can be used to score the membership of each transaction for each topic. See “Topic Scores”. The report also gives the following information about the varimax rotation:
Transform
The rotation matrix for the varimax rotation.
Rotated V Matrix
The matrix of item scores for each topic. Each column corresponds to an item. The rotated V matrix results from a varimax rotation of the V matrix in the SVD analysis. Large values indicate an affinity between the item and the topic.
Rotated U Matrix
The matrix of transaction scores for each topic. Each column corresponds to a transaction. Transactions with higher scores in a topic are more likely to be associated with that topic. Large values indicate an affinity between the transaction and the topic.
Topic Portion
Shows the topic portion values for each topic.
Image shown hereTopic Scores
(Available only when Rotated SVD is selected from the red triangle menu next to Association Analysis.) The Topic Scores report shows the topic scores for all transactions in one-dimensional scatterplots. Negative values indicate transactions that are negatively associated with a topic. Use these plots to explore the distribution of transactions within each topic. See “Additional Example: SVD Analysis”.
Tip: Select points in a topic score plot to select both the corresponding rows in the data table and the corresponding transactions in the other topic score plots.
Image shown hereAdditional Example: SVD Analysis
In this example, you use singular value decomposition of the transaction item matrix to gain further insight into the Grocery Purchases.jmp sample data.
1. Select Help > Sample Data Library and open Grocery Purchases.jmp.
2. Select Analyze > Screening > Association Analysis.
3. Select Product and click Item.
4. Select Customer ID and click ID.
5. Click OK.
6. Click the red triangle next to Association Analysis and select SVD.
Figure 20.5 SVD Plots
SVD Plots
The transaction SVD plot suggests that there might be two or three groups of transactions. In the upper right corner of the item SVD plot, notice that the points that represent Coke and ice cream overlap. The proximity of these two items indicates that there is a strong affinity between them.
7. Click the red triangle next to Association Analysis and select Rotated SVD.
8. Enter 3 next to Number of Topics (rotated singular vectors) and click OK.
The Topic Items and Topic Scores reports appear.
Figure 20.6 Topic Items Report
Topic Items Report
Three groups, or topics, are created and shown in the Topic Items report. The first items listed in the Topic Item tables represent the primary items for that group. For example, Topic 1 is a group that is identified primarily by transactions that contain avocados, but do not contain olives.
Figure 20.7 Topic Scores
Topic Scores
The topic scores that are assigned to each of the 1001 transactions are plotted in the Topic Scores report. Select groups of points for a topic to see how those transactions relate to other topics. For example, transactions with very high values on Topic 1 tend to have low values on Topics 2 and 3.
9. Open the Singular Values report.
Figure 20.8 Singular Values Table
Singular Values Table
As seen in Figure 20.8, the first two singular values explain only about 30% of the variability in the grocery store data. Additional dimensions might be required to explain a sufficient amount of variability.
Image shown hereStatistical Details for the Association Analysis Platform
This section contains statistical details for the Association Analysis platform.
Image shown hereFrequent Item Set Generation
The Association Analysis platform uses the Apriori algorithm to reduce computational time when generating frequent item sets. The Apriori algorithm leverages the fact that an item set’s support is never larger than the support of its subsets. The platform generates larger item sets from combinations of smaller item sets that meet the minimum support level. In addition, the platform does not generate item sets that exceed either the specified maximum number of antecedents or the maximum rule size. These options are useful when working with large data sets, because the total possible number of rules increases exponentially with the number of items. For more information about the Apriori algorithm, see Agrawal and Srikant (1994).
Image shown hereAssociation Analysis Performance Measures
This section defines the performance measures used in Association Analysis. Denote the condition item set by X and the consequent item set by Y. Denote an association rule with condition set X and consequent set Y by Equation shown here.
Image shown hereSupport
Support is the proportion of transactions in which an item set occurs.
Equation shown here
Image shown hereConfidence
Confidence is the proportion of transactions that contain the consequent item set, given that the transaction contains the condition item set.
Equation shown here
An association rule with a confidence of 0% has a consequent item set that does not appear in any transaction with the condition item set. A confidence of 100% indicates that every transaction that contains the condition item set also contains the consequent item set.
Image shown hereLift
Lift measures dependency between X and Y.
Equation shown here
The numerator for lift is the proportion of transactions where X and Y occur jointly. The denominator is an estimate of the expected joint occurrence of X and Y, assuming that they occur independently.
A lift value of 1 indicates that X and Y jointly occur in transactions with the frequency that would be expected by chance alone. Increasing lift values suggest that Y occurs more often than expected when X is present.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset