Response Screening Platform Overview
Response Screening automates the process of conducting tests across a large number of responses. It tests each response that you specify against each factor that you specify. Response screening addresses two main issues connected with large-scale data. These are the need to conduct many tests, and the requirement to deal effectively with outliers and missing values.
Response screening is available as a platform and as a Fit Model personality. In both cases, it performs tests analogous to those found in the Fit Y by X platform, as shown in Table 17.1. As a personality, it performs tests of the response against the individual model effects.
To facilitate and support the multiple inferences that are required, Response Screening provides these features:
Data Tables
Results are shown in data tables, as well as in a report, to enable you to explore, sort, search, and plot your results. Statistics that facilitate plot interpretation are provided, such as the LogWorth of p-values.
False Discovery Rates
Because you are conducting a large number of tests, you need to control the overall rate of declaring tests significant by chance alone. Response screening controls the false discovery rate. The False Discovery Rate (FDR) is the expected proportion of significant tests that are incorrectly declared significant (Benjamini and Hochberg, 1995; Westfall et al., 2011).
Tests of Practical Significance
When you have many observations, even small effects that are of no practical consequence can result in statistical significance. To address this issue, you can define an effect size that you consider to be of practical significance. You then conduct tests of practical significance, thereby only detecting effects large enough to be of pragmatic interest.
Equivalence Tests
When you are studying many factors, you are often interested in those that have essentially equivalent effects on the response. In this case, you can specify an effect size that defines practical equivalence and then conduct equivalence tests.
To address issues that arise when dealing with messy data, Response Screening provides features to deal with outliers and missing data. These features enable you to analyze your data directly, without expending effort to address data quality issues:
Robust Estimation
Outliers in your data increase estimates of standard error, causing tests to be insensitive to real effects. Select the Robust option to conduct Huber M-estimation. Outliers remain in the data, but the sensitivity of tests to these outliers is reduced.
Missing Value Options
The platform contains an option to treat missing values on categorical predictors in an informative fashion.
 
Table 17.1 Analyses Performed by Response Screening 
Response
Factor
Fit Y by X Analysis
Description
Continuous
Categorical
Oneway
Analysis of Variance
Continuous
Continuous
Bivariate
Simple Linear Regression
Categorical
Categorical
Contingency
Chi-Square
Categorical
Continuous
Logistic
Simple Logistic Regression
The Response Screening platform generates a report and a data table: the Response Screening report and the PValues table. The Response Screening personality generates a report and two data tables: the Fit Response Screening report, the PValues table, and the Y Fits table.
The JSL command Summarize Y by X performs the same function as the Response Screening platform but without creating a platform window. See Summarize YByX in the JSL Syntax Reference book for details.
Example of Response Screening
The Probe.jmp sample data table contains 387 characteristics (the Responses column group) measured on 5800 wafers. The Lot ID and Wafer Number columns uniquely identify the wafer. You are interested in which of the characteristics show different values across a process change (Process).
1. Select Help > Sample Data Library and open Probe.jmp.
2. Select Analyze > Screening > Response Screening.
The Response Screening launch window appears.
3. Select the Responses column group and click Y, Response.
4. Select Process and click X.
5. Enter 100 in the MaxLogWorth box.
A log worth of 100 or larger corresponds to an extremely small p-value. Setting a value for the MaxLogWorth helps control the scale of plots.
6. Click OK.
The Response Screening report appears, along with a data table of supporting information. The report (Figure 17.2) shows the FDR PValue Plot, but also contains two other plot reports. The table contains a row for each of the 387 columns that you entered as Y, Response.
The FDR PValue Plot shows two types of p-values, FDR PValue and PValue, for each of the 387 tests. These are plotted against Rank Fraction. PValue is the usual p-value for the test of a Y against Process. The FDR PValue is a p-value that is adjusted to guarantee a given false discover rate (FDR), here 0.05. The FDR PValues are plotted in blue and the PValues are plotted in red. The Rank Fraction ranks the FDR p-values from smallest to largest, in order of decreasing significance.
Both the horizontal blue line and the sloped red line on the plot are thresholds for FDR significance. Tests with FDR p-values that fall below the blue line are significant at the 0.05 level when adjusted for the false discovery rate. Tests with ordinary p-values that fall below the red line are significant at the 0.05 level when adjusted for the false discovery rate. In this way, the plot enables you to read FDR significance from either set of p-values.
Figure 17.2 Response Screening Report for 387 Tests against Process
Response Screening Report for 387 Tests against Process
The FDR PValue Plot shows that more than 60% of the tests are significant. A handful of tests are significant using the usual p-value, but not significant using the FDR p-value. These tests correspond to the red points that are above the red line, but below the blue line.
To identify the characteristics that are significantly different across Process, you can drag a rectangle around the appropriate points in the plot. This selects the rows corresponding to these points in the PValues table, where the names of the characteristics are given in the first column. Alternatively, you can select the corresponding rows in the PValues table.
The PValues data table (Figure 17.3) contains 387 rows, one for each response measure in the Responses group. The response is given in the first column, called Y. Each response is tested against the effect in the X column, namely, Process.
Figure 17.3 PValues Data Table, Partial View
PValues Data Table, Partial View
The remaining columns give information about the test of Y against X. Here the test is a Oneway Analysis of Variance. In addition to other information, the table gives the test’s p-value, LogWorth, FDR (False Discovery Rate) p-value, and FDR LogWorth. Use this table to sort by the various statistics, select rows, or plot quantities of interest.
Notice that LogWorth and FDR LogWorth values that correspond to p-values of 1e-100 or less are reported as 100, because you set MaxLogWorth to 100 in the launch window. Also, cells corresponding to FDR LogWorth values greater than two are colored with an intensity gradient.
See “The Response Screening Report” for details about the report and PValues table.
Launch the Response Screening Platform
Launch the Response Screening platform by selecting Analyze > Screening > Response Screening.
Figure 17.4 Response Screening Launch Window
Response Screening Launch Window
Launch Window Roles
Y, Response
Identifies the response columns containing the measurements to be analyzed.
X
Identifies the columns against which you want to test the responses.
Grouping
For each level of the specified column, analyzes the corresponding rows separately, but presents the results in a single table and report.
Weight
Identifies a column whose values assign a weight to each row. These values are used as weights in the analysis. For details, see the Weight section in the Model Specification chapter in the Fitting Linear Models book.
Freq
Identifies a column whose values assign a frequency to each row. These values enable you to account for pre-summarized data. For details, see the Frequency section in the Model Specification chapter in the Fitting Linear Models book.
By
For each level of the specified column, analyzes the corresponding Ys and Xs and presents the results in separate tables and reports.
Launch Window Options
Robust
For continuous responses, uses robust (Huber) estimation to down weight outliers. If there are no outliers, these estimates are close to the least squares estimates. Note that this option increases processing time.
Cauchy
Assumes that the errors have a Cauchy distribution. A Cauchy distribution has fatter tails than the normal distribution, resulting in a reduced emphasis on outliers. This option can be useful if you have a large proportion of outliers in your data. However, if your data are close to normal with only a few outliers, this option can lead to incorrect inferences. The Cauchy option estimates parameters using maximum likelihood and a Cauchy link function.
Poisson Y
Fits each Y response as a count having a Poisson distribution. The test is only performed for categorical X. This option is appropriate when your responses are counts.
Kappa
Adds a new column called Kappa to the data table. If Y and X are both categorical and have the same levels, kappa is provided. This is a measure of agreement between Y and X.
Corr
The Corr option computes the Pearson product-moment correlation in terms of the indices defined by the value ordering.
The calculation of the Pearson product-moment correlation gives Spearman’s rho in the following instances:
X and Y are both ordinal
X and Y are nominal where their value ordering corresponds to the order relation
If X and Y are both binary, the Pearson calculation gives Kendall's Tau-b. Otherwise, a value of Corr that is large in magnitude indicates an association; a Corr value that is small in magnitude does not preclude an association.
Same Y Scale
Aligns all the Y responses to the same scale when you run individual analyses using the report’s Fit Selected Items options.
Missing is category
For any categorical X variable, treats missing values on X as a category.
Force X Categorical
Ignores the modeling type and treats all X columns as categorical.
Force X Continuous
Ignores the modeling type and treats all X columns as continuous.
Force Y Categorical
Ignores the modeling type and treats all Y columns as categorical.
Force Y Continuous
Ignores the modeling type and treats all Y columns as continuous.
Paired X and Y
Performs tests only for Y columns paired with X columns according to their order in the Y, Response and X lists. The first Y is paired with the first X, the second Y with the second X, and so on.
Unthreaded
Suppresses multithreading.
Practical Difference Portion
The fraction of the specification range, or of an estimated six standard deviation range, that represents a difference that you consider pragmatically meaningful. If Spec Limits is not set as a column property, a range of six standard deviations is estimated for the response. The standard deviation estimate is computed from the interquartile range (IQR), as Equation shown here.
If no Practical Difference Proportion is specified, its value defaults to 0.10. Tests of practical significance and equivalence tests use this difference to determine the practical difference. See “Compare Means Data Table”.
MaxLogWorth
Use to control the scale of plots involving LogWorth values (-log10 of p-values). LogWorth values that exceed MaxLogWorth are plotted as MaxLogWorth to prevent extreme scales in LogWorth plots. See “Example of the MaxLogWorth Option” for an example.
OK
Conducts the analysis and displays the results.
Cancel
Closes the launch window.
Remove
Removes the selected variable from the assigned role.
Recall
Populates the launch window with the previous model specification that you ran.
Help
Opens the Help topics for the Response Screening launch window.
The Response Screening Report
The Response Screening report consists of several Graph Builder plots. These plots focus on False Discovery Rate (FDR) statistics. For details, see “The False Discovery Rate”.
The default plots are the FDR PValue Plot, the FDR LogWorth by Effect Size, and the FDR LogWorth by RSquare. If you select the Robust option on the launch window, Robust versions of each of these reports are also presented. In addition, a Robust LogWorth by LogWorth plot is presented to help assess the impact of using the robust fit. The standard Graph Builder red triangle options for each plot are available. For details, see the Graph Builder chapter in the Essential Graphing book.
FDR PValue Plot
The FDR PValue Plot report shows a plot of FDR PValues and PValues against the Rank Fraction. The Rank Fraction ranks the PValues in order of decreasing significance. FDR PValues are plotted in blue and PValues in red.
A blue horizontal line shows the 0.05 significance level. Note that you can change this level by double-clicking the y-axis, removing the current reference line, and adding a new reference line.
A red increasing line provides an FDR threshold for unadjusted p-values. A p-value falls below the red line precisely when the FDR-adjusted p-value falls below the blue line. This enables you to read significance relative to the FDR from either the adjusted or unadjusted p-values.
Figure 17.5 shows the FDR PValue Plot for the Probe.jmp sample data table. Note that some tests are significant according to the usual p-value but not according to the FDR p-value.
Figure 17.5 FDR PValue Plot
FDR PValue Plot
FDR LogWorth by Effect Size
When you have large effects, the associated p-values are often very small. Visualizing these small values graphically can be challenging. When transformed to the LogWorth (-log10(p-value)) scale, highly significant p-values have large LogWorths and nonsignificant p-values have low LogWorths. A LogWorth of zero corresponds to a nonsignificant p-value of 1. Any LogWorth above 2 corresponds to a p-value below 0.01.
In the FDR LogWorth by Effect Size plot, the vertical axis is the FDR LogWorth and the horizontal axis is the Effect Size. Generally, larger effects lead to more significant p-values and larger LogWorths. However, this relationship is not necessarily strong because significance also depends on the error variance. In fact, large LogWorths can be associated with small effects, and small LogWorths can be associated with large effects, because of the size of the error variance. The FDR LogWorth by Effect Size plot enables you to explore this relationship.
Figure 17.6 shows the FDR LogWorth by Effect size plot for the Probe.jmp sample data table with MaxLogWorth set to 100. Most FDR LogWorth values exceed 2, which indicates that most effects are significant at the 0.01 level. The FDR LogWorth values of 100 correspond to extremely small p-values.
Figure 17.6 FDR LogWorth by Effect Size
FDR LogWorth by Effect Size
FDR LogWorth by RSquare
The FDR LogWorth by RSquare plot shows the FDR LogWorth on the vertical axis and RSquare values on the horizontal axis. Larger LogWorth values tend to be associated with larger RSquare values, but this relationship also depends on the number of observations.
The PValues Data Table
The PValues data table contains a row for each pair of Y and X variables. If you specified a column for Group, the PValues data table contains a first column called Group. A row appears for each level of the Group column and for each pair of Y and X variables. The PValues data table also contains a table variable called Original Data that gives the name of the data table that was used for the analysis. If you specified a By variable, JMP creates a PValues table for each level of the By variable, and the Original Data variable gives the By variable and its level.
Figure 17.7 shows the PValues data table created in “Example of Response Screening”.
Figure 17.7 PValues Data Table, Partial View
PValues Data Table, Partial View
PValues Data Table Columns
The PValues data table displays columns containing measures and statistics that are appropriate for the selected fit and combination of Y and X modeling types. The columns in the data table include:
Y
The specified response columns.
X
The specified factor columns.
Count
The number of rows used for testing, or the corresponding sum of the Freq or Weight variable.
PValue
The p-value for the significance test corresponding to the pair of Y and X variables. See the Basic Analysis book for additional details about Fit Y by X statistics.
LogWorth
The quantity -log10(p-value). This transformation adjusts p-values to provide an appropriate scale for graphing. A value that exceeds 2 is significant at the 0.01 level (because Equation shown here).
FDR PValue
The False Discovery Rate p-value calculated using the Benjamini-Hochberg technique. This technique adjusts the p-values to control the false discovery rate for multiple tests. If there is no Group variable, the set of multiple tests includes all tests displayed in the table. If there is a Group variable, the set of multiple tests consists of all tests conducted for each level of the Group variable. For details about the FDR correction, see Benjamini and Hochberg, 1995. For details about the false discovery rate, see “The False Discovery Rate”.
FDR LogWorth
The quantity -log10(FDR PValue). This is the best statistic for plotting and assessing significance. Note that small p-values result in high FDR LogWorth values. Cells corresponding to FDR LogWorth values greater than two (p-values less than 0.01) are colored with an intensity gradient.
Effect Size
Indicates the extent to which response values differ across the levels or values of X. Effect sizes are scale invariant.
When Y is continuous, the effect size is the square root of the average sum of squares for the hypothesis divided by a robust estimate of the response standard deviation. If the interquartile range (IQR) is nonzero, the standard deviation estimate is Equation shown here. If the IQR is zero, the sample standard deviation is used.
When Y is categorical and X is continuous, the effect size is the square root of the average ChiSquare value for the whole model test.
When Y and X are both categorical, the effect size is the square root of the average Pearson ChiSquare.
Rank Fraction
The rank of the FDR LogWorth expressed as a fraction of the number of tests. If the number of tests is m, the largest FDR LogWorth value has Rank Fraction 1/m, and the smallest has Rank Fraction 1. Equivalently, the Rank Fraction ranks the p-values in increasing order, as a fraction of the number of tests. The Rank Fraction is used in plotting the PValues and FDR PValues in rank order of decreasing significance.
YMean
The mean of Y.
SSE
Appears when Y is continuous. The sum of squares for error.
DFE
Appears when Y is continuous. The degrees of freedom for error.
MSE
Appears when Y is continuous. The mean squared error.
F Ratio
Appears when Y is continuous. The F Ratio for the analysis of variance or regression test.
RSquare
Appears when Y is continuous. The coefficient of determination, which measures the proportion of total variation explained by the model.
DF
Appears when Y and X are both categorical. The degrees of freedom for the ChiSquare test.
LR Chisq
Appears when Y and X are both categorical. The value of the Likelihood Ratio ChiSquare statistic.
Columns Added for Robust Option
If you suspect that your data contains outliers, select the Robust option on the launch window to reduce the sensitivity of tests for continuous responses to outliers. With this option, Huber M-estimates (Huber and Ronchetti, 2009) are used in fitting regression and ANOVA models. Huber M-estimates are fairly close to least squares estimates when there are no outliers, but use outlier-downweighting when there are outliers.
The following columns are added to the PValues data table when the Robust option is selected in the launch window. The Robust option only applies when Y is continuous, so Robust column cells are empty when Y is categorical. See the Bivariate chapter in the Basic Analysis book for additional details about Huber M-estimation. For an example, see “Example of Robust Fit”.
Robust PValue
The p-value for the significance test corresponding to the pair of Y and X variables using a robust.
Robust LogWorth
The quantity -log10(Robust PValue).
Robust FDR PValue
The False Discovery Rate calculated for the Robust PValues using the Benjamini-Hochberg technique. If there is no Group variable, the multiple test adjustment applies to all tests displayed in the table. If there is a Group variable, the multiple test adjustment applies to all tests conducted for each level of the Group variable.
Robust FDR LogWorth
The quantity -log10(Robust FDR PValue).
Robust Rank Fraction
The rank of the Robust FDR LogWorth expressed as a fraction of the number of tests.
Robust Chisq
The chi-square value associated with the robust test.
Robust Sigma
The robust estimate of the error standard deviation.
Robust Outlier Portion
The portion of the values whose distance from the robust mean exceeds three times the Robust Sigma.
Robust CpuTime
Time in seconds required to create the Robust report.
PValues Data Table Scripts
Relevant scripts are saved to the PValues data table. All but one of these reproduce plots provided in the report. When you select rows in the PValues table, the Fit Selected script produces the appropriate Fit Y by X analyses.
Response Screening Platform Options
The Response Screening red triangle menu contains options to customize the display and to compute and save calculated data.
Fit Selected Items
For selected relationships, adds the appropriate Fit Y by X reports to the Response Screening report. You can select relationships by selecting rows in the PValue data table or points in the plots.
Select Columns
Selects the columns in the original data table that correspond to rows that you select in the PValues table or to points that you select in plots in the Response Screening report window. Select the rows or points first, then select Select Columns. The corresponding columns in the data table are selected. You can select columns corresponding to additional rows in the PValues table or points in plots by first selecting them and then selecting Select Columns again. To select columns corresponding to different rows or points, first clear the current column selection in the original data table.
Save Means
For continuous Ys and categorical Xs, creates a data table with the counts, means, and standard deviations for each level of the categorical variable. If the Robust option is selected, the robust mean is included.
Save Compare Means
For continuous Ys and categorical Xs, tests all pairwise comparisons across the levels of the categorical variable. For each comparison, the data table gives the usual t-test, a test of practical significance, an equivalence test, and a column that uses color coding to summarize the results. The data table also contains a script that plots Practical LogWorth by Relative Practical Difference. See “Compare Means Data Table”. For an example, see “Example of Tests of Practical Significance and Equivalence”.
Save Std Residuals
Saves a new group of columns to the original data table and places these in a column group call Residual Group. For each continuous Y and categorical X, a column is constructed containing the residuals divided by their estimated standard deviation. In other words, the column contains standardized residuals. The column is defined by a formula.
If the Robust option is selected, standardized residual columns are constructed using robust fits and robust estimates.
Save Outlier Indicator
Saves a new group of columns to the original data table and places these in a column group call Outlier Group. Save Outlier Indicator is most effective when you have selected the Robust option.
For each continuous Y and categorical X, a column that indicates outliers is constructed. An outlier is a point whose distance to the predicted value exceeds three times an estimate of sigma. In other words, an outlier is a point whose standardized residual exceeds three. The column is defined by a formula.
If the Robust option is selected, robust fits and robust estimates are used. An outlier is a point whose distance to the predicted value exceeds three times the robust estimate of sigma.
The Cluster Outliers script is added to the original data table. The script shows outliers on a hierarchical cluster plot of the data.
See the JMP Reports chapter in the Using JMP book for more information about the following options:
Local Data Filter
Shows or hides the local data filter that enables you to filter the data used in a specific report.
Redo
Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.
Save Script
Contains options that enable you to save a script that reproduces the report to several destinations.
Save By-Group Script
Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.
Means Data Table
The Means data table contains a row for each combination of response and X level. For the Probe.jmp sample data table, there are 387 response variables, each tested against Process at two levels. The Means table contains 387x2 = 774 rows (Figure 17.8).
Figure 17.8 Means Data Table
Means Data Table
The Means data table includes the following columns:
Y
The continuous response variables.
X
The categorical variables.
Level
The level of the categorical X variable.
Count
The count of values in the corresponding Level.
Mean
The mean of the Y variable for the specified Level.
StdDev
The standard deviation of the Y variable for the specified Level.
Robust Mean
The robust M-estimate of the mean. Appears when you select the Robust option on the launch window.
Compare Means Data Table
When your data table consists of a large number of rows (large n), the standard error used in testing can be very small. As a result, tests might be statistically significant, when in fact, the observed difference is too small to be of practical consequence. Tests of practical significance enable you to specify the size of the difference that you consider worth detecting. This difference is called the practical difference. Instead of testing that the difference is zero, you test whether the difference exceeds the practical difference. As a result, the tests are more meaningful, and fewer tests need to be scrutinized.
Equivalence tests enable you to determine whether two levels have essentially the same effect, from a practical perspective, on the response. In other words, an equivalence test tests whether the difference is smaller than the practical difference.
The Compare Means data table provides results for both tests of practical difference and tests of practical equivalence. Each row compares a response across two levels of a categorical factor. Results of the pairwise comparisons are color-coded to facilitate interpretation. See “Practical Difference” for a description of how the practical difference is specified. See “Example of Tests of Practical Significance and Equivalence” for an example.
Figure 17.9 Compare Means Data Table
Compare Means Data Table
The Compare Means data table contains a script that plots Practical LogWorth by Relative Practical Difference. Relative Practical Difference is defined as the actual difference divided by the practical difference.
Y
The continuous response variables.
X
The categorical variables.
Leveli
The level of the categorical X variable.
Levelj
The level of the categorical X variable being compared to Leveli.
Difference
The estimated difference in means across the two levels. If the Robust option is selected, robust estimates of the means are used.
Std Err Diff
The standard error of the difference in means. This is a robust estimate if the Robust option is selected.
Plain Dif PValue
The p-value for the usual Student's t-test for a pairwise comparison. This is the robust version of the t-test when the Robust option is selected. Tests that are significant at the 0.05 level are highlighted.
Practical Difference
The difference in means that is considered to be of practical interest. If you assign a Spec Limit property to the Y variable, the practical difference is computed as the difference between the specification limits multiplied by the Practical Difference Proportion. If no Practical Difference Proportion has been specified, the Practical Difference is the difference between the specification limits multiplied by 0.10.
If you do not assign a Spec Limit property to the Y variable, an estimate of its standard deviation is computed from its interquartile range (IQR). This estimate is Equation shown here. The Practical Difference is computed as Equation shown here multiplied by the Practical Difference Proportion. If no Practical Difference Proportion has been specified, the Practical Difference is computed as Equation shown here multiplied by 0.10.
Practical Dif PValue
The p-value for a test of whether the absolute value of the mean difference in Y between Leveli and Levelj is less than or equal to the Practical Difference. A small p-value indicates that the absolute difference exceeds the Practical Difference. This indicates that Leveli and Levelj account for a difference that is of practical consequence.
Practical Equiv PValue
Uses the Two One-Sided Tests (TOST) method to test for a practical difference between the means (Schuirmann, 1987). The Practical Difference specifies a threshold difference for which smaller differences are considered practically equivalent. One-sided t tests are constructed for two null hypotheses: the true difference exceeds the Practical Difference; the true difference is less than the negative of the Practical Difference. If both tests reject, this indicates that the absolute difference in the means falls within the Practical Difference. Therefore, the groups are considered practically equivalent.
The Practical Equivalence PValue is the largest p-value obtained on the one-sided t tests. A small Practical Equiv PValue indicates that the mean response for Leveli is equivalent, in a practical sense, to the mean for Levelj.
Practical Result
A description of the results of the tests for practical difference and equivalence. Values are color-coded to help identify significant results.
Different (Pink): Indicates that the absolute difference is significantly greater than the practical difference.
Equivalent (Green): Indicates that the absolute difference is significantly within the practical difference.
Inconclusive (Gray): Indicates that neither the test for practical difference nor the test for practical equivalence is significant.
The Response Screening Personality in Fit Model
If you are interested in univariate tests against linear model effects, you can fit the Response Screening personality in Fit Model. The report and tables produced test all responses against all model effects.
Launch Response Screening in Fit Model
Select Analyze > Fit Model. Enter your Ys and model effects. Select Response Screening from the Personality list (Figure 17.10).
Figure 17.10 Response Screening from the Fit Model Window
Response Screening from the Fit Model Window
Note that a Robust Fit check box is available. Selecting this option enables robust estimation for tests involving continuous responses. These tests use robust (Huber) estimation to down weight outliers. If there are no outliers, these estimates are close to the least squares estimates. Selecting this option increases processing time.
The Informative Missing option provides a coding system for missing values (Figure 17.11). The Informative Missing coding allows estimation of a predictive model despite the presence of missing values. It is useful in situations where missing data are informative. Select this option from the Model Specification red triangle menu.
Figure 17.11 Informative Missing Option
Informative Missing Option
For details about the Fit Model window, see the Model Specification chapter in the Fitting Linear Models book.
The Fit Response Screening Report
The Fit Response Screening report shows two plots:
The FDR PValue Plot
The FDR LogWorth by Rank Fraction Plot
The FDR PValue Plot is interpreted in the same way as for the platform itself. See“The Response Screening Report”.
The FDR LogWorth by Rank Fraction plot shows FDR LogWorth values plotted against the ranks of the p-values. The plotted points decrease or remain constant as rank fraction increases. The plot gives an indication of what proportion of tests are significant. An example using the Response Screening personality is given in “Response Screening Personality”.
Model Dialog
Opens a window containing the model dialog that you have run to obtain the given report.
Save Estimates
Opens a data table in which each row corresponds to a response and the columns correspond to the model terms. The entries are the parameter estimates obtained by fitting the specified model. This data table also contains a table variable called Original Data that gives the name of the data table that was used for the analysis. If you specified a By variable, JMP creates an estimates table for each level of the By variable, and the Original Data variable gives the By variable and its level.
Save Prediction Formula
Adds columns to the original data table containing prediction equations for all responses.
Save Least Squares Means
Opens a data table where each row corresponds to a response and a combination of effect settings. The row contains the least squares mean and standard error for that combination of settings.
See the JMP Reports chapter in the Using JMP book for more information about the following options:
Local Data Filter
Shows or hides the local data filter that enables you to filter the data used in a specific report.
Redo
Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.
Save Script
Contains options that enable you to save a script that reproduces the report to several destinations.
Save By-Group Script
Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.
PValues Data Table
The PValues data table contains a row for each pair consisting of a Y variable and a model Effect. The columns in the table include the following. If you select the Robust Fit option on the launch window, the models are fit using Huber M-estimation.
Y
The specified response columns.
Effect
The specified model effects.
FRatio
The test statistic for a test of the Effect. This is the value found in the Effect Tests report in Least Squares Fit.
PValue
The p-value for the significance test corresponding to the FRatio. See the Standard Least Squares chapter in the Fitting Linear Models book for additional details about Effect Tests.
LogWorth
The quantity -log10(p-value). This transformation adjusts p-values to provide an appropriate scale for graphing. A value that exceeds 2 is significant at the 0.01 level (because Equation shown here).
FDR PValue
The False Discovery Rate p-value calculated using the Benjamini-Hochberg technique. This technique adjusts the p-values to control the false discovery rate for multiple tests. For details about the FDR correction, see Benjamini and Hochberg, 1995. For details about the false discovery rate, see “The False Discovery Rate” or Westfall et al. (2011).
FDR LogWorth
The quantity -log10(FDR PValue). This is the best statistic for plotting and assessing significance. Note that small p-values result in high FDR LogWorth values.
Rank Fraction
The rank of the FDR LogWorth expressed as a fraction of the number of tests. If the number of tests is m, the largest FDR LogWorth value has Rank Fraction 1/m, and the smallest has Rank Fraction 1. Equivalently, the Rank Fraction ranks the p-values in increasing order, as a fraction of the number of tests. The Rank Fraction is used in plotting the PValues and FDR PValues in rank order of decreasing significance.
Test DF
The degrees of freedom for the effect test.
The PValues data table also contains a table variable called Original Data that gives the name of the data table that was used for the analysis. If you specified a By variable, JMP creates a PValues table for each level of the By variable, and the Original Data variable gives the By variable and its level.
Y Fits Data Table
The Y Fits data table contains a row for Y variable. For each Y, the columns in the table summarize information about the model fit. If you select the Robust Fit option on the launch window, the models are fit using Huber M-estimation.
Y
The specified response columns.
RSquare
The multiple correlation coefficient.
RMSE
The Root Mean Square Error.
Count
The number of observations (or sum of the Weight variable).
Overall FRatio
The test statistic for model fit from the Analysis of Variance report in Least Squares Fit.
Overall PValue
The p-value for the overall test of model significance.
Overall LogWorth
The LogWorth of the p-value for the overall test of model significance.
Overall FDR PValue
The overall p-value adjusted for the false discovery rate. (See “The Response Screening Report”.)
Overall FDR LogWorth
The LogWorth of the Overall FDR PValue.
Overall Rank Fraction
The rank of the Overall FDR LogWorth expressed as a fraction of the number of tests. If the number of tests is m, the largest Overall FDR LogWorth value has Rank Fraction 1/m, and the smallest has Rank Fraction 1.
<Effect> PValue
These columns contain p-values for tests of each model effect. These columns are arranged in a group called PValue in the columns panel.
<Effect> LogWorth
These columns contain LogWorths for the p-values for tests of each model effect. These columns are arranged in a group called LogWorth in the columns panel.
<Effect> FDR LogWorth
These columns contain FDR LogWorths for tests of each model effect. These columns are arranged in a group called FDR LogWorth in the columns panel.
The Y Fits data table also contains a table variable called Original Data that gives the name of the data table that was used for the analysis. If you specified a By variable, JMP creates a Y Fits table for each level of the By variable, and the Original Data variable gives the By variable and its level.
Additional Examples of Response Screening
The following examples illustrate various aspects of Response Screening.
Example of Tests of Practical Significance and Equivalence
This example tests for practical differences using the Probe.jmp sample data table.
1. Select Help > Sample Data Library and open Probe.jmp.
2. Select Analyze > Screening > Response Screening.
The Response Screening Launch window appears.
3. Select the Responses column group and click Y, Response.
4. Select Process and click X.
5. Type 0.15 in the Practical Difference Portion box.
6. Click OK.
7. From the Response Screening report’s red triangle menu, select Save Compare Means.
Figure 17.12 shows a portion of the data table. For each response in Y, the corresponding row gives information about tests of the New and the Old levels of Process.
Figure 17.12 Compare Means Table, Partial View
Compare Means Table, Partial View
Because specification limits are not saved as column properties in Probe.jmp, JMP calculates a value of the practical difference for each response. The practical difference of 0.15 that you specified is multiplied by an estimate of the 6σ range of the response. This value is used in testing for practical difference and equivalence. It is shown in the Practical Difference column.
The Plain Difference column shows responses whose p-values indicate significance. The Practical Diff PValue and Practical Equiv PValue columns give the p-values for tests of practical difference and practical equivalence. Note that many columns show statistically significant differences, but do not show practically significant differences.
8. Display the Compare Means data table and select Analyze > Distribution.
9. Select Practical Result and click Y, Columns.
10. Click OK.
Figure 17.13 shows the distribution of results for practical significance. Only 37 tests are different, as determined by testing for the specified practical difference. For 5 of the responses, the tests were inconclusive. You cannot tell whether the responses result in a practical difference across Process.
Figure 17.13 Distribution of Practical Significance Results
Distribution of Practical Significance Results
The 37 responses can be selected for further study by clicking on the corresponding bar in the plot.
Example of the MaxLogWorth Option
When data sets have a large number of observations, p-values can be very small. LogWorth values provide a useful way to study p-values graphically in these cases. But sometimes p-values are so small that the LogWorth scale is distorted by huge values.
1. Select Help > Sample Data Library and open Probe.jmp.
2. Select Analyze > Screening > Response Screening.
3. In the Response Screening Launch window, select the Responses column group and click Y, Response.
4. Select Process and click X.
5. Select the Robust check box.
6. Click OK.
The analysis is numerically intensive and may take some time to complete.
7. In the Response Screening report, open the Robust FDR LogWorth by Effect Size report.
The detail in the plot is hard to see, because of the huge Robust FDR LogWorth value of about 58,000 (Figure 17.14). To ensure that your graphs show sufficient detail, you can set a maximum value of the LogWorth.
Figure 17.14 Robust FDR LogWorth vs. Effect Size, MaxLogWorth Not Set
Robust FDR LogWorth vs. Effect Size, MaxLogWorth Not Set
8. Repeat step 1 through step 5.
9. Type 1000 in the MaxLogWorth box at the bottom of the launch window.
10. Click OK.
The analysis may take some time to complete.
11. In the Response Screening report, open the Robust FDR LogWorth by Effect Size report.
Now the detail in the plot is apparent (Figure 17.15).
Figure 17.15 Robust FDR LogWorth vs. Effect Size, MaxLogWorth = 1000
Robust FDR LogWorth vs. Effect Size, MaxLogWorth = 1000
Example of Robust Fit
1. Open the Drosophila Aging.jmp table.
2. Select Analyze > Screening > Response Screening.
3. Select all of the continuous columns and click Y, Response.
4. Select line and click X.
5. Check Robust.
6. Click OK.
The Robust FDR PValue Plot is shown in Figure 17.16. Note that a number of tests are significant using the unadjusted robust p-values, as indicated by the red points that are less than 0.05. However, only two tests are significant according to the robust FDR p-values.
Figure 17.16 Robust FDR PValue Plot for Drosophila Data
Robust FDR PValue Plot for Drosophila Data
These two points are more easily identified in a plot that shows FDR LogWorths.
7. Click the Robust FDR LogWorth by Effect Size disclosure icon.
8. Drag a rectangle around the two points with Robust FDR LogWorth values that exceed 1.5.
9. In the PValues data table, select Rows > Label/Unlabel.
The plot shown in Figure 17.17 appears. Points above the red line at 2 have significance levels below 0.01. A horizontal line at about 1.3 corresponds to a 0.05 significance level.
Figure 17.17 Robust LogWorth by Effect Size for Drosophila Data
Robust LogWorth by Effect Size for Drosophila Data
10. Click the Robust LogWorth by LogWorth disclosure icon.
The plot shown in Figure 17.18. If the robust test for a response were identical to the usual test, its corresponding point would fall on the diagonal line in Figure 17.18. The circled point in the plot does not fall near the line, because it has a Robust LogWorth value that exceeds its LogWorth value.
Figure 17.18 Robust LogWorth by LogWorth for Drosophila Data
Robust LogWorth by LogWorth for Drosophila Data
11. Drag a rectangle around this point in the plot.
12. Find the row for this point in the PValues data table.
Note that the response, log2in_CG8237 has PValue 0.9568 and Robust PValue 0.0176.
13. In the Response screening report, select Fit Selected Items from the red triangle menu.
A Fit Selected Items report is displayed containing a Oneway Analysis for the response log2in_CG8237. The plot shows two outliers for the ORE line (Figure 17.19). These outliers indicate why the robust test and the usual test give disparate results. The outliers inflate the error variance for the non-robust test, which makes it more difficult to see a significant effect. In contrast, the robust fit down-weights these outliers, thereby reducing their contribution to the error variance.
Figure 17.19 Oneway Analysis for log2in_CG8237
Oneway Analysis for log2in_CG8237
Response Screening Personality
The Response Screening personality in Fit Model allows you to study tests of multiple responses against linear model effects. This example analyses a model with two main effects and an interaction.
1. Open the Drosophila Aging.jmp table.
2. Select Analyze > Fit Model.
3. Select all the continuous columns and click Y.
4. Select channel and click Add.
5. Select sex, line, and age and select Macros > Full Factorial.
6. Select Response Screening from the Personality list.
7. Click Run.
The Fit Response Screening report appears. Two data tables are also presented: Y Fits summarizes the overall model tests, and PValues tests the individual effects in the model for each Y.
To get a general idea of which effects are important, do the following:
8. Run the FDR LogWorth by Rank Fraction script in the PValues data table.
9. Select Rows > Data Filter.
10. In the Data Filter window, select Effect and click Add.
11. In the Data Filter, click through the list of the model effects, while you view the selected points in the FDR LogWorth by Rank Fraction plot.
Keep in mind that values of LogWorth that exceed 2 are significant at the 0.01 level. The Data Filter helps you see that, with the exception of sex and channel, the model effects are rarely significant at the 0.01 level. Figure 17.20 shows a reference line at 2. The points for tests of the line*age interaction effect are selected. None of these are significant at the 0.01 level.
Figure 17.20 FDR LogWorth vs Rank Fraction Plot with line*age Tests Selected
FDR LogWorth vs Rank Fraction Plot with line*age Tests Selected
Statistical Details
The False Discovery Rate
All of the Response Screening plots involve p-values for tests conducted using the FDR technique described in Benjamini and Hochberg, 1995. See also Westfall et al. (2011). This method assumes that the p-values are independent and uniformly distributed.
JMP uses the following procedure to control the false discovery rate at level α:
1. Conduct the m hypothesis tests of interest to obtain p-values Equation shown here.
2. Rank the p-values from smallest to largest. Denote these by Equation shown here.
3. Find the largest p-value for which Equation shown here. Suppose this first p-value is the kth largest, Equation shown here.
4. Reject the k hypotheses associated with p-values less than or equal to Equation shown here.
This procedure ensures that the expected false discovery rate does not exceed α.
The p-values adjusted for the false discovery rate, denoted Equation shown here, are computed as:
Equation shown here
If a hypothesis has an FDR-adjusted p-value that falls below α, then it is rejected by the procedure.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset