Association Analysis Platform Overview

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The Association Analysis platform identifies connections among groups of items in an independent event or transaction. In association analysis, an item is the basic object of interest. For example, an item could be a product, a web page, or a service. An item set is a list of one or more items.

The relationship between two item sets is defined by an association rule. An association rule consists of a condition item set and a consequent item set. Antecedents are the individual items in the condition item set. Association analysis identifies association rules, which predict that a consequent item set will be in a transaction, given that the condition item set is already in the transaction. Some association rules are stronger, and therefore more useful, than others. The following three performance measures describe the strength of an association rule:

• Support is the proportion of transactions in which an item set appears. A high value for support indicates that the item set occurs frequently.

• Confidence is the proportion of transactions that contain the consequent item set, given that the condition item set is in the transaction. Confidence measures the strength of implication, or the predictive power, of an association rule.

• Lift is the ratio of an association rule’s confidence to its expected confidence, assuming that the condition and consequent item sets appear in transactions independently. Lift measures how much the consequent item set depends on the presence of the condition item set. The minimum value for lift is 0.

‒ A lift ratio less than 1 indicates that the condition and consequent repel each other, because they occur together less frequently than one would expect by chance alone.

‒ A lift ratio close to 1 indicates that the consequent occurs at the same rate in transactions that contain the condition as one would expect from chance alone.

‒ A lift ratio greater than 1 indicates that the consequent item set has an affinity for the condition item set. The consequent item set occurs more often with the condition item set than one would expect by chance alone.

For more information about these performance measures, see “Association Analysis Performance Measures”.

The Association Analysis platform also enables you to perform singular value decomposition. Singular value decomposition (SVD) groups similar transactions and also groups similar items using a matrix reducing methodology that is different from what is used in association analysis. Use the SVD methodology to gain insights that complement what you learn from association analysis.

For more information about association analysis, see Hastie et al. (2009) and Shmueli et al. (2010). For more information about singular value decomposition, see Jolliffe (2002).

Example of the Association Analysis Platform

This example uses the Grocery Purchases.jmp sample data table, which contains transactional data reported by a grocery store. The data table lists the items purchased by 1001 customers, each assigned a unique customer ID. You want to explore the associations among items in order to identify patterns in consumer behavior.

1. Select Help > Sample Data Library and open Grocery Purchases.jmp.

2. Select Analyze > Screening > Association Analysis.

3. Select Product and click Item.

4. Select Customer ID and click ID.

5. Click OK.

Figure 20.2 Association Analysis Report

The fourth entry in the Rules report table indicates that 58% of customers who bought an avocado also bought an artichoke. The value of Lift is 1.908, indicating that there is a likely dependency. You want to verify that avocados and artichokes occur in a significant portion of transactions.

6. Click the disclosure icon next to Frequent Item Sets.

Figure 20.3 Frequent Item Sets Report

The Frequent Item Sets report shows that 36% of customers purchased avocados. The Rules report in Figure 20.2 shows that 58% of these customers also bought artichokes. Because of the large proportion of customers who follow this behavior, the grocery store management might use this information to strategically locate avocados and artichokes.

You also decide to look at the association rules with the highest lift.

7. Right-click in the Rules report table and select Sort By Column.

The Select Columns window appears.

8. Select Lift and click OK.

The Rules table is sorted by decreasing values of lift. Notice that the second association rule has a lift of 6.912 and 97% confidence. You want to verify that both the condition set, {Coke, Heineken, sardines}, and the consequent item set, {chicken, ice cream}, have adequate support.

9. Right-click in the Frequent Item Sets report and select Sort By Column.

The Select Columns window appears.

10. Select Item Set and the check the ascending order option.

11. Click OK.

The Frequent Item Sets table is sorted alphabetically by item set. Scroll through the list to see that the condition item set, {Coke, Heineken, sardines}, has 12% support and that the consequent item set, {chicken, ice cream}, has 14% support. This association rule has high lift, but represents fewer transactions than the first association rule that you examined.

Launch the Association Analysis Platform

Launch the Association Analysis platform by selecting Analyze > Screening > Association Analysis.

Figure 20.4 Association Analysis Launch Window

Item

The categorical column that contains the item data to be analyzed.

The column that identifies the transaction that an item belongs to.

Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.

Minimum Support

Specifies a minimum value for the proportion of occurrences of an item set. This value must be between 0 and 1. Only item sets with support equal to or exceeding this value are considered in the analysis.

Minimum Confidence

Specifies a minimum value for the proportion of occurrences that a consequent item set occurs within transactions that contain the conditional item set. This value must be between 0 and 1. Only association rules with confidence equal to or exceeding this value appear in the report.

Minimum Lift

Specifies a minimum dependency ratio. Lift values must be 0 or greater. Only association rules with lift equal to or exceeding this value appear in the report.

Maximum Antecedents

Specifies the maximum number of items in the condition item set. Association rules with more than this number of items in the condition set are not considered in the analysis.

Maximum Rule Size

Specifies the maximum number of items that appear in the union of the condition and consequent item sets. Association rules with more than this combined number of items are not considered in the analysis.

Note: You can use the minimum support, maximum antecedent, and maximum rule size options in the launch window to reduce computational time for large data sets. For more information about these measures, see “Statistical Details for the Association Analysis Platform”.

The Association Analysis Report

By default, the Association Analysis report contains the following reports:

• “Frequent Item Sets”

• “Rules”

Tip: To order the contents of a table in a report by any of its columns, right-click in the table and select Sort by Column.

Frequent Item Sets

The Frequent Item Sets report lists item sets in decreasing order of support. The listed item sets meet the Minimum Support value that you specified in the launch window. Each item set is considered as a conditional and as a consequent item set to form association rules. The table contains the following columns:

Item Set

The item sets that are considered as conditional or consequent sets for the association rules.

Support

The proportion of transactions in which all of the items in the Item Set occur.

N Items

The number of items in the Item Set.

Rules

The Rules report shows a table of association rules that are sorted in increasing order of number of items in the condition item set. The rules are further sorted alphabetically by the items contained in the union of the condition and consequent item sets. Only association rules that meet the Minimum Support, Minimum Confidence, Minimum Lift, Maximum Antecedents, and Maximum Rule Size requirements that you specified in the launch window appear in this report.

The Rules report table contains the following columns:

Rule

The association rules formed by combining Condition and Consequent item sets.

Condition

The item set that is thought to influence the presence of a Consequent item set within transactions.

Consequent

The item set whose presence is thought to be influenced by the presence of a Condition item set.

Confidence

The proportion of transactions that contain the Consequent item set, given that the condition item set is in the transaction. Confidence measures the strength of implication, or the predictive power, of an association rule.

Lift

• The ratio of an association rule’s confidence to its expected confidence, assuming that the condition and consequent item sets appear in transactions independently. Lift measures how much the Consequent item set depends on the presence of the Condition item set. The minimum value for lift is 0.

‒ A lift ratio less than 1 indicates that the Condition and Consequent item sets repel each other, because they occur together less frequently than one would expect by chance alone.

‒ A lift ratio close to 1 indicates that the Consequent item set occurs at the same rate in transactions that contain the Condition item set as one would expect from chance alone.

‒ A lift ratio greater than 1 indicates that the Consequent item set has an affinity for the Condition item set. The Consequent item set occurs more often with the Condition item set than one would expect by chance alone.

Association Analysis Platform Options

The Association Analysis red triangle menu contains the following options:

Transaction Listing

Shows or hides a table listing each Transaction ID value and the items included in that transaction. The table is sorted by the Transaction ID column.

Frequent Item Sets

Shows or hides a list of item sets whose support exceeds the Minimum Support value specified in the launch window. See “Frequent Item Sets” for more information.

Rules

Shows or hides a table of association rules that meet the Minimum Support, Minimum Confidence, Minimum Lift, Maximum Antecedents, and Maximum Rule Size requirements specified in the launch window. See “Rules” for more information.

SVD

Shows or hides scatterplots of the first two singular vectors for transactions and for items, calculated by singular value decomposition on the incidence matrix for the items. The report also contains a table of singular values sorted in descending order. The Percent and Cum Percent columns show the additional and cumulative variability in the data explained by the corresponding singular value. The bar chart shows the Percent variation explained by each singular value. For more information, see “SVD”.

Rotated SVD

(Available only if SVD is selected.) Shows or hides the Topic Items and Topic Scores reports. This option performs a varimax rotated singular value decomposition of the transaction item matrix to produce groups of similar transactions called topics. See “Rotated SVD”.

Save Transaction SVD

Creates a data table that contains a number of singular vectors that you specify for each transaction. These are the left singular values in the transaction item matrix. See “Singular Value Decomposition”.

Save Item SVD

Creates a data table that contains a number of singular vectors that you specify for each item. These are the right singular values in the transaction item matrix. See “Singular Value Decomposition”.

See the JMP Reports chapter in the Using JMP book for more information about the following options:

Local Data Filter

Shows or hides the local data filter that enables you to filter the data used in a specific report.

Redo

Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.

Save Script

Contains options that enable you to save a script that reproduces the report to several destinations.

Save By-Group Script

Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.

SVD

Singular value decomposition (SVD) complements association analysis by providing another method to identify items that have an affinity for each other. Singular value decomposition of the transaction item matrix reduces the matrix to a manageable number of dimensions, thereby enabling you to group similar transactions and similar items.

Transaction Item Matrix

The transaction item matrix is a matrix for which each row corresponds to a transaction each column corresponds to an item. The entries of the matrix are zeros and ones. If an item occurs in a transaction, the corresponding row and column entry is one. Otherwise, the row and column entry is zero. Because the transaction item matrix usually contains more values of zero than one, it is called a sparse matrix.

Singular Value Decomposition

The singular value decomposition approximates the transaction item matrix using three matrices: U, S, and V‘. The relationship between these matrices is defined as follows:

Transaction Item Matrix ≈ U * S * V‘

Define nTransactions as the number of transactions (rows) in the transaction item matrix, and nItems as the number of items (columns) in the transaction item matrix, and nVec as the specified number of singular vectors. Note that nVec must be less than or equal to min(nTransactions, nItems). It follows that U is an nTransactions by nVec matrix. S is a diagonal matrix of dimension nVec. The diagonal entries in S are the singular values in the SVD. V‘ is an nVec by nTransactions matrix. The rows in V‘ are the singular vectors.

The singular vectors capture connections among different items with similar functions or topic areas. If three items tend to appear in the same transactions, the SVD is likely to produce a singular vector in V‘ with large values for those three items. The U singular vectors represent the transactions projected into this new item space.

The SVD also captures indirect connections. If two items never appear together in the same transaction, but they generally appear in transactions with another third item, the SVD is able to capture some of that connection. If two transactions have no items in common but contain items that are connected in the dimension-reduced space, they map to similar vectors in the SVD plots.

The SVD transforms transaction data into a fixed-dimensional vector space, making it amenable to clustering, classification, and regression techniques. The Save options enable you to export this vector space to be analyzed in other JMP platforms.

SVD Report

SVD Plots

The SVD Plots report shows scatterplots of the first two singular vectors for both the transaction and the item data.

Tip: To see the transaction or item that a point represents, place your cursor over the point. To add the label to the plot, select the point, right-click in the plot, and select Row Label.

The Transaction SVD plot contains a point for each transaction. For a given transaction, the point that is plotted is defined by the transaction’s values on the first two singular vectors in U. In the Transaction SVD plot, points that are visibly grouped together indicate transactions with a similar composition.

The Item SVD plot contains a point for each item. For a given item, the point that is plotted is defined by the item’s values on the first two singular vectors in V. In the Item SVD plot, items that are visibly grouped together indicate items that have similar functions or topic areas.

See “Additional Example: SVD Analysis”.

Caution: The first two singular vectors might not adequately capture the structure of your data. The “Singular Values” report shows how much variability is explained by the singular vectors.

Singular Values

The kth row in the Singular Values table shows the additional and cumulative percent of variability explained by using the kth singular value or singular vector column.

Rotated SVD

(Available only when SVD is selected from the red triangle menu next to Association Analysis.) The Rotated SVD option performs a varimax rotation on the singular value decomposition (SVD) of the transaction item matrix. See “Transaction Item Matrix”. You must specify a number of rotated singular vectors, which corresponds to the number of topics that are created by the platform.

Topics are groups of transactions that are grouped based on a primary item indicator, as well as secondary item indicators. For each topic, every item has a weight that influences a transaction’s membership in the topic. The cumulative sum of the item weights for all of the items that are present in a transaction is called the topic score. Topic scores reflect the strength of a transaction’s membership for a topic.

The varimax rotation rotates the singular vectors to more closely align them with the coordinate axes. This rotation helps facilitate interpretation by resulting in high loadings on a few axes and small loadings on the others. The loadings are given in the Rotated V Matrix and Rotated U Matrix reports.

See “Additional Example: SVD Analysis”.

Topic Items

(Available only when Rotated SVD is selected from the red triangle menu next to Association Analysis.) The Topic Items report shows a number of transaction groups, called topics. The resulting report shows the strongest indicators for each topic sorted in descending order by the absolute value of the score. The items with the largest absolute scores represent the thematic composition of a topic. The topic items can be used to score the membership of each transaction for each topic. See “Topic Scores”. The report also gives the following information about the varimax rotation:

Transform

The rotation matrix for the varimax rotation.

Rotated V Matrix

The matrix of item scores for each topic. Each column corresponds to an item. The rotated V matrix results from a varimax rotation of the V matrix in the SVD analysis. Large values indicate an affinity between the item and the topic.

Rotated U Matrix

The matrix of transaction scores for each topic. Each column corresponds to a transaction. Transactions with higher scores in a topic are more likely to be associated with that topic. Large values indicate an affinity between the transaction and the topic.

Topic Portion

Shows the topic portion values for each topic.

See “Additional Example: SVD Analysis”.

Topic Scores

(Available only when Rotated SVD is selected from the red triangle menu next to Association Analysis.) The Topic Scores report shows the topic scores for all transactions in one-dimensional scatterplots. Negative values indicate transactions that are negatively associated with a topic. Use these plots to explore the distribution of transactions within each topic. See “Additional Example: SVD Analysis”.

Tip: Select points in a topic score plot to select both the corresponding rows in the data table and the corresponding transactions in the other topic score plots.

Additional Example: SVD Analysis

In this example, you use singular value decomposition of the transaction item matrix to gain further insight into the Grocery Purchases.jmp sample data.

1. Select Help > Sample Data Library and open Grocery Purchases.jmp.

2. Select Analyze > Screening > Association Analysis.

3. Select Product and click Item.

4. Select Customer ID and click ID.

5. Click OK.

6. Click the red triangle next to Association Analysis and select SVD.

Figure 20.5 SVD Plots

The transaction SVD plot suggests that there might be two or three groups of transactions. In the upper right corner of the item SVD plot, notice that the points that represent Coke and ice cream overlap. The proximity of these two items indicates that there is a strong affinity between them.

7. Click the red triangle next to Association Analysis and select Rotated SVD.

8. Enter 3 next to Number of Topics (rotated singular vectors) and click OK.

The Topic Items and Topic Scores reports appear.

Figure 20.6 Topic Items Report

Three groups, or topics, are created and shown in the Topic Items report. The first items listed in the Topic Item tables represent the primary items for that group. For example, Topic 1 is a group that is identified primarily by transactions that contain avocados, but do not contain olives.

Figure 20.7 Topic Scores

The topic scores that are assigned to each of the 1001 transactions are plotted in the Topic Scores report. Select groups of points for a topic to see how those transactions relate to other topics. For example, transactions with very high values on Topic 1 tend to have low values on Topics 2 and 3.

9. Open the Singular Values report.

Figure 20.8 Singular Values Table

As seen in Figure 20.8, the first two singular values explain only about 30% of the variability in the grocery store data. Additional dimensions might be required to explain a sufficient amount of variability.

Statistical Details for the Association Analysis Platform

This section contains statistical details for the Association Analysis platform.

Frequent Item Set Generation

The Association Analysis platform uses the Apriori algorithm to reduce computational time when generating frequent item sets. The Apriori algorithm leverages the fact that an item set’s support is never larger than the support of its subsets. The platform generates larger item sets from combinations of smaller item sets that meet the minimum support level. In addition, the platform does not generate item sets that exceed either the specified maximum number of antecedents or the maximum rule size. These options are useful when working with large data sets, because the total possible number of rules increases exponentially with the number of items. For more information about the Apriori algorithm, see Agrawal and Srikant (1994).

Association Analysis Performance Measures

This section defines the performance measures used in Association Analysis. Denote the condition item set by X and the consequent item set by Y. Denote an association rule with condition set X and consequent set Y by

Support

Support is the proportion of transactions in which an item set occurs.

Confidence

Confidence is the proportion of transactions that contain the consequent item set, given that the transaction contains the condition item set.

An association rule with a confidence of 0% has a consequent item set that does not appear in any transaction with the condition item set. A confidence of 100% indicates that every transaction that contains the condition item set also contains the consequent item set.

Lift

Lift measures dependency between X and Y.

The numerator for lift is the proportion of transactions where X and Y occur jointly. The denominator is an estimate of the expected joint occurrence of X and Y, assuming that they occur independently.

A lift value of 1 indicates that X and Y jointly occur in transactions with the frequency that would be expected by chance alone. Increasing lift values suggest that Y occurs more often than expected when X is present.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Association Analysis Platform Overview

Create new playlist

Sign In

Sign Up

Table of Contents for
Association Analysis Platform Overview