Latent Class Analysis Platform Overview

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The Latent Class Analysis platform fits a latent class model to categorical response variables and determines the most likely cluster or latent class for each observation. A latent variable is an unobservable grouping variable. Each level of the latent variable is called a latent class. For example, latent classes could be clusters of survey respondents that are grouped by their appetite for risk.

The model takes the form of a multinomial mixture model. There are two sets of parameters in the model: the γ parameters and the ρ parameters. The γ parameters represent the overall probabilities of cluster membership. The ρ parameters represent the probabilities of observing a given response conditional on cluster membership. A latent class is characterized by a pattern of these conditional probabilities.

In order for the analysis results to be meaningful, a subject matter expert must interpret the clusters that the platform generates. This subject matter expert examines characteristics of the latent classes and constructs a definition for each class based on those characteristics.

Note: Rows with missing values in any of the response columns are excluded from the analysis.

For more information about latent class models, see Collins and Lanza (2010) and Goodman (1974).

Example of Latent Class Analysis

This example uses the Latent Class Analysis platform to analyze responses to a 2005 survey of US high school students. The survey asked students a variety of multiple choice questions regarding health-risk behaviors.

In this example, you fit a latent class model to identify clusters of students based on their responses to a subset of 12 of these questions. The columns that you analyze were obtained from multiple choice survey questions by binning the responses into two classes (Yes/No).

1. Select Help > Sample Data Library and open Health Risk Survey.jmp.

2. In the Health Risk Survey data table, click the green triangle next to the Launch LCA Platform script.

The script selects the 12 columns of interest, opens the Latent Class Analysis launch window, and enters the 12 columns of interest as Y.

3. Type 5 in the box next to Up to.

This option fits latent class models for 3 and up to 5 clusters.

4. Click OK.

The Fit Group outline contains three Latent Class Analysis reports. The reports fit models for three, four, and five clusters.

5. Click the red triangle next to Fit Group and select Order by Goodness of Fit.

Because it has the smallest BIC value of the three models, the model with five clusters now appears first in the Fit Group report.

6. In the Latent Class Analysis for 5 Clusters report, examine the bar charts under Parameter Estimates. Note the following:

‒ Cluster 1 has mostly No answers to all of the risk behaviors.

‒ Cluster 2 has high numbers of Yes answers for the four risk behaviors before the age of 13.

‒ Cluster 3 has high numbers of Yes answers for driving when drinking and five or more drinks in the past 30 days.

‒ Cluster 4 has high numbers of Yes answers for most of the risk behaviors except for the ones before the age of 13.

‒ Cluster 5 has the highest number of Yes answers for most of the risk behaviors.

Use this information to give the clusters meaningful names.

7. Click the red triangle next to Latent Class Analysis for 5 Clusters and select Rename Clusters:

‒ Enter Low Risk for Cluster 1.

‒ Enter Early Risk-Takers for Cluster 2.

‒ Enter Drinkers for Cluster 3.

‒ Enter Late High Risk for Cluster 4.

‒ Enter High Risk for Cluster 5.

8. Click OK.

9. Click OK in the JMP Alert that appears.

Note: The new cluster names are not saved to scripts.

Figure 10.2 Partial Parameter Estimates Report

Figure 10.2 shows parameter estimates for the first 8 variables in the analysis. The new cluster names appear in the report window.

Next, compare cluster membership to the demographic question “In what grade are you”.

10. Click the red triangle next to Latent Class Analysis for 5 Clusters and select Save Mixture and Cluster Formulas.

11. Select Graph > Graph Builder.

12. Enter In what grade are you as X.

13. Enter Most Likely Cluster Formula as Y.

14. Select the Mosaic element.

15. Click Done.

Figure 10.3 Mosaic Plot of Cluster Membership versus Grade Level

Observe that most of the respondents fall into the Low Risk cluster. The class labeled Drinkers includes more respondents as the grade level increases.

Launch the Latent Class Analysis Platform

Launch the Latent Class Analysis platform by selecting Analyze > Clustering > Latent Class Analysis.

Figure 10.4 Latent Class Analysis Launch Window

The Latent Class Analysis platform launch window contains the following options:

One or more categorical response columns that you want to analyze.

Weight

A column whose numeric values assign a weight to each row in the analysis.

Freq

A column whose numeric values assign a frequency to each row in the analysis.

A column used to identify separate respondents. This identification is used in some output tables.

A column that creates a report consisting of separate analyses for each level of the variable. If more than one By variable is assigned, a separate analysis is produced for each possible combination of the levels of the By variables.

Number of Clusters

The number of clusters to be computed in the analysis.

Up to

Specifies a maximum number of clusters. If this number exceeds the value specified for Number of Clusters, a model report is produced with a number of clusters equal to each integer value in the range between Number of Clusters and Up to. These reports appear as part of a Fit Group outline.

After you click OK on the launch window, the Latent Class Analysis report appears.

The Latent Class Analysis Report

By default, the Latent Class Analysis report contains a Fit Group report that contains Latent Class Analysis reports for each specified number of clusters. Options in the Fit Group report enable you to arrange the Latent Class Analysis reports in rows and to order the reports by goodness of fit.

The Latent Class Analysis also contains the following results and outlines:

• “BIC”

• “Parameter Estimates”

• “Transposed Parameter Estimates”

• “Effect Sizes”

• “MDS Plot”

• “Mixture Probabilities”

BIC

The BIC value for the model with the specified number of clusters appears at the top of each Latent Class Analysis report. The BIC is a goodness of fit measure. Lower values of BIC indicate better fits.

Parameter Estimates

The Parameter Estimates report contains two tables. Each table contains rows corresponding to the model clusters. The first table gives the numerical results. The second table graphs the results with share charts.

The Overall column in both tables shows the probability of an observation belonging to each cluster. (These are the γ parameters. See “Statistical Details for the Latent Class Analysis Platform”.)

The remaining columns in the first table are grouped with vertical dividers according to the Y columns specified in the Latent Class Analysis launch window. Each group of columns has a column for each level of the corresponding Y column. In each group, the value in a given row and column is the conditional probability of the response indicated by the column, given that the observation belongs to the cluster identified by the row. (These are the ρ parameters.)

The graph in the second table shows the conditional probability values as share charts. For each cluster and each Y, the conditional probabilities given cluster membership are plotted as a horizontal stacked bar chart. The stacking of bars follows the order of appearance of the variables in the table of values.

Tip: You can select one or more rows in either table in the Parameter Estimates report to select the observations assigned to the corresponding clusters.

Transposed Parameter Estimates

The Transposed Parameter Estimates report contains a table that is the transpose of the first table shown in the Parameter Estimates report. Here the clusters are shown as columns. The conditional probabilities for each cluster are shown for each response category of each Y column in the analysis.

Note: The estimates from the Overall column are not included in the transposed table.

Effect Sizes

The Effect Sizes table compares the Y columns across clusters. The statistics in each row of this table are obtained from a contingency table analysis of expected counts for cluster membership by levels of a Y column. The expected counts are obtained by multiplying the number of observations in each cluster by the conditional probabilities for each level of the Y column.

For each response, the Pearson chi-square statistic, X2, is calculated for the contingency table of expected counts for levels by clusters. Let n represent the number of observations. The value in the Effect Size column is defined as follows:

Each value in the LR Logworth column shows -log10(pLR) where pLR is the likelihood ratio test p-value for the contingency table of expected counts. A Logworth value above 2 corresponds to significance at the 0.01 significance level.

Tip: You can select one or more rows in the Effect Sizes table to select the corresponding columns in the data table.

MDS Plot

The MDS Plot contains one point for each cluster. It is a two-dimensional representation of cluster proximity. Clusters that are closer together are more similar. The plot is created from a dissimilarity matrix of the ρ parameters. For more information about MDS plots, see the Multidimensional Scaling chapter in the Consumer Research book.

Mixture Probabilities

The Mixture Probabilities table displays probabilities of cluster membership for each row. The Most Likely Cluster column indicates the cluster with the highest probability of membership for each row.

Note: Rows that contain a missing value for one or more of the Y columns are excluded from the analysis and do not appear in the Mixture Probabilities table.

Latent Class Analysis Platform Options

Fit Group Options

The Fit Group red triangle menu contains the following options:

Arrange in Rows

Arranges the Latent Class Analysis reports horizontally in the Fit Group report.

Order by Goodness of Fit

Rearranges the Latent Class Analysis reports in order of increasing BIC values. The report for the model with the smallest BIC value appears first in the Fit Group report.

Latent Class Analysis Options

The Latent Class Analysis red triangle menu contains the following options:

New Number of Clusters

Enables you to run another analysis using a different number of clusters. The new analysis report is appended to the current report.

Color by Cluster

Colors each row in the data table according to its most likely cluster. For an example, see “Additional Example: Plot Probabilities of Cluster Membership”.

Save Mixture and Cluster Formulas

Saves a formula column to the data table for each cluster as well as a formula column for the most likely cluster.

Save Cluster Formula Only

Saves a column to the data table with a formula that determines the most likely cluster.

Publish Probability Formulas

Creates probability formulas and saves them as formula column scripts in the Formula Depot platform. If a Formula Depot report is not open, this option creates a Formula Depot report. See the Formula Depot chapter in the Predictive and Specialized Modeling book.

Save Mixture Probabilities

Saves the values in the Mixture Probabilities table to the corresponding rows in the data table.

Save Cluster Only

Saves a new column to the data table that contains the most likely cluster for each row. This column does not contain a formula.

Rename Clusters

Enables you to give meaningful names to the clusters in the report.

Note: The new cluster names are not saved to a script unless you have specified a random seed for the report. Setting a random seed is available only when you launch the report via a script.

See the JMP Reports chapter in the Using JMP book for more information about the following options:

Local Data Filter

Shows or hides the local data filter that enables you to filter the data used in a specific report.

Redo

Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.

Save Script

Contains options that enable you to save a script that reproduces the report to several destinations.

Save By-Group Script

Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.

Additional Example: Plot Probabilities of Cluster Membership

This example uses the Car Poll.jmp sample data table, which contains survey data for car owners and car makes. You are interested in classifying the car owners into three clusters and producing a plot to visualize the probabilities of cluster membership. A ternary plot provides a good visualization when you have three clusters.

1. Select Help > Sample Data Library and open Car Poll.jmp.

2. Select Analyze > Clustering > Latent Class Analysis.

3. Select all of the columns except age and click Y.

4. Click OK.

5. Click the red triangle next to Latent Class Analysis for 3 Clusters and select Color by Cluster.

6. Click the red triangle next to Latent Class Analysis for 3 Clusters and select Save Mixture Probabilities.

7. In the Car Poll data table window, select the LCA Cluster Probabilities column group from the column list.

8. Select Graph > Ternary Plot.

9. Click X, Plotting.

10. Click OK.

Figure 10.5 Ternary Plot of Cluster Membership Probabilities

Figure 10.5 shows the ternary plot of cluster probabilities for each observation. Most of the cluster membership probabilities fall near the vertices, which indicates that they have high values for one cluster and lower values for the other two. However, there are some points in the middle of the plot, indicating that these observations do not have high probabilities of cluster membership for any of the clusters. These observations might warrant closer inspection or they might indicate that more clusters are needed to better represent the data.

Note: Your results might be different because no random seed was specified.

Statistical Details for the Latent Class Analysis Platform

This section describes the latent class model that is fit in the Latent Class Analysis platform. For more information about latent class models, see Collins and Lanza (2010) and Agresti (2002).

Note: The LCA algorithm that is used in the Text Explorer platform takes advantage of the specific structure of the document term matrix. For this reason, the LCA results in the Text Explorer platform do not exactly match the results in the Latent Class Analysis platform.

Let j = 1, ..., J represent the observed columns of responses. These are the Y columns in the Latent Class Analysis platform launch window. Denote the number of levels for column j by Rj.

A multidimensional contingency table of the J variables contains W = R1*...*RJ cells. Each of these cells is defined by its response pattern for the J variables. Therefore, each response pattern is a J-length vector of the form y = y1, ..., yj. Define Y to be the W by J array of all the response patterns considered as row vectors. Each row yw in Y has a probability Pr(yw). These probabilities sum to 1:

Consider the following notation:

• C is the number of clusters in the latent class model.

• γc is the probability of membership in cluster c. (The γc are the latent class prevalences.) These parameters sum to 1.

• rj,k is the kth level of the jth response.

• ρj,k|c is the probability of observing response rj,k in column j conditional on membership in class c. (The ρj,k|c are the item-response probabilities.) For a given cluster and response variable j, the sum of the ρj,k|c is 1.

• I( yj = rj,k ) is an indicator function that equals 1 when the yj response is the kth level of the jth response, and 0 otherwise.

The probability of observing a specific vector of responses yw = y1, ..., yj is the sum of the conditional probabilities of observing that vector of responses for each of the C latent classes:

This equation is the denominator in the Prob Formula Cluster formulas that you can save to the data table by selecting the Save Mixture and Cluster Formulas option from the Latent Class Analysis red triangle menu. The formula in the Prob Formula Cluster column gives Pr(Cluster = c | yw), which equals Pr(yw, Cluster = c) / Pr(yw).

The γ and ρ parameters for latent class models are estimated using the iterative Expectation-Maximization (EM) algorithm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Latent Class Analysis Platform Overview

Create new playlist

Sign In

Sign Up

Table of Contents for
Latent Class Analysis Platform Overview