Overview of the Fit Definitive Screening Platform
The Fit Definitive Screening platform analyzes definitive screening designs (DSDs) using a methodology that takes advantage of their special structure. The methodology is called Effective Model Selection for DSDs. If you created your design in JMP, the design table contains a script called Fit Definitive Screening that automatically runs an analysis using the Effective Model Selection for DSDs methodology.
Identification of Active Effects in DSDs
DSDs are three-level designs that are valuable for identifying main effects and second-order effects in a single experiment. A minimum run-size DSD is capable of correctly identifying active terms with high probability if the number of active effects is less than about half the number of runs and if the effects sizes exceed twice the standard deviation.
However, by augmenting a minimum run-size DSD with four or more properly selected runs, you can identify substantially more effects with high probability. These runs are called Extra Runs, and correspond to fictitious inactive factors, called fake factors. For information about Extra Runs, see “Structure of Definitive Screening Designs” in the “Definitive Screening Designs” chapter.
Extra Runs substantially increase the design’s ability to detect second-order effects. For this reason, Jones and Nachtsheim (2016) strongly encourage the inclusion of at least four Extra Runs.
Effective Model Selection for DSDs
When standard model selection methods are applied to DSDs, they can fail to identify active effects. See Errore et al. (2016). Also, standard selection methods do not leverage the structure of DSDs. The Fit Definitive Design platform uses the Effective Model Selection for DSDs approach, which takes full advantage of the structure of the DSD.
Jones and Nachtsheim (2016) report on simulation studies using Effect Model Selection for DSDs as well as standard approaches. Denote by c the sum of the number of factors and the number of fake factors in a DSD. In many situations, if the number of active main effects exceeds three, then up to c/2 active second-order effects can be reliably identified. Assuming strong effect heredity, if there are three or fewer active main effects, then all active second-order effects can be reliably identified. Reliable identification means that the ratio of the absolute value of the coefficient to the error standard deviation exceeds three and that the power to detect the effect exceeds 0.80.
The Fit Definitive Screening platform assumes strong effect heredity. Strong effect heredity means that the A*B interaction can only be considered for inclusion in the model if both A and B have been included. Strong effect heredity requires that all lower-order components of a model effect be included in the model. In identifying active second-order effects, the algorithm uses strong effect heredity and the results cited earlier about how many active second-order effects can be reliably identified.
In a DSD, main effects and second-order effects are orthogonal to each other. The Effective Model Selection for DSDs approach takes advantage of this fact. The linear space of the response is separated into the subspace spanned by the main effects and the orthogonal complement of this subspace. Miller and Sitter (2005) refer to the linear subspace spanned by the main effects as the odd space, because it contains all the information about odd effects: main effects, 3-factor effects, 5-factors effects, and so on. They refer to its orthogonal complement as the even space, because it contains all the information about even effects: the intercept, 2-factor effects, 4-factor effects, and so on.
Fit Definitive Screening follows this thinking. The subspace spanned by the main effects is the odd space. Its orthogonal complement, the even space, contains the second-order effects and the block variable, if one exists. For details of the algorithm, see “The Effective Model Selection for DSDs Algorithm” and Jones and Nachtsheim (2016).
Example of the Fit Definitive Screening Platform
The design in the data table Extraction 3 Data.jmp is a definitive screening design for six factors in two blocks. The Add Blocks with Center Runs to Estimate Quadratic Effects option was selected, and 4 Extra Runs were added. The resulting design has 18 runs.
Fit the Model
1. Select Help > Sample Data Library and open Design Experiment/Extraction 3 Data.jmp.
2. Select DOE > Definitive Screening > Fit Definitive Screening.
3. Select Yield and click Y.
4. Select Lot through Time and click X.
5. Click OK.
The fit performs a two-stage analysis. For details about the algorithm, see “Technical Details”.
Examine Results
Stage 1: Main Effect Estimates
Stage 1 determines which main effects are likely to be active.
Figure 8.2 Stage 1 Report for Main Effects
Stage 1 Report for Main Effects
Note: The fake factors do not appear in the design or as factors in the analysis.
A two-degree-of-freedom error sum of squares is computed from the four runs corresponding to the two fake factors. Because the fake factors are, by construction, inactive, this estimate of error variance is unbiased. For each main effect, the main effects response YME is tested against this estimate. In this example, three factors, Methanol, Ethanol, and Time, have p-values smaller than the threshold value and are retained as active. For details about the threshold values, see “Stage 1 Methodology”.
The variability from the three inactive factors, Propanol, Butanol, and pH, is pooled with the fake factor sum of squares to produce the five-degree-of-freedom RMSE statistic shown in Figure 8.2.
Stage 2: Even Order Effect Estimates
Stage 2 uses guided subset selection to arrive at a list of second-order effects that are likely to be active.
Figure 8.3 Stage 2 Report for Even-Order Effects
Stage 2 Report for Even-Order Effects
Because three main effects are identified as active in Stage 1, the guided subset selection procedure for active second-order effects can continue until all second-order effects are included. Because all six second-order effects are reported in Stage 2, it follows that the Stage 2 RMSE remained larger than the Stage 1 RMSE. See “Stage 2 Methodology”.
The two-degree-of-freedom RMSE given in the Stage 2 report is the error estimate obtained from the final subset of all six second-order effects.
Combined Results
The effects selected for the model are listed in the Combined Model Parameter Estimates report.
Figure 8.4 Combined Model Parameter Estimates Report
Combined Model Parameter Estimates Report
The RMSE and degrees of freedom given at the bottom of the report are the usual standard least squares quantities. Use these effects as potential factors for your final model.
Reduce the Model
The Make Model button enters the model for the listed terms in a Fit Model specification window. To run the model directly using standard least squares, click the Run Model button.
1. Click Run Model.
The Actual by Predicted Plot shows no lack of fit. The Effect Summary report suggests that you can reduce the model further.
Figure 8.5 Actual by Predicted Plot and Effect Summary Report
Actual by Predicted Plot and Effect Summary Report
2. Select Methanol*Ethanol in the Effect Summary report and click Remove.
Methanol*Time has p-value 0.33750. Remove it next.
3. Select Methanol*Time in the Effect Summary report and click Remove.
Ethanol*Ethanol has p-value 0.15885. Remove it next.
4. Select Ethanol*Ethanol in the Effect Summary report and click Remove.
Figure 8.6 Effect Summary Report Showing Effects in Final Model
Effect Summary Report Showing Effects in Final Model
The remaining effects are significant. You conclude that these are the active effects.
Launch the Fit Definitive Screening Platform
To launch the Fit Definitive Screening platform, select DOE > Definitive Screening > Fit Definitive Screening. The launch window in Figure 8.7 uses Extraction3 Data.jmp.
Note: If you created your design in JMP, the design table contains a script called Fit Definitive Screening. Run this script to run the analysis directly.
Figure 8.7 Fit Definitive Screening Launch Window
Fit Definitive Screening Launch Window
Y
One or more numeric response variables.
X
Continuous or two-level categorical factors. Because the platform uses the unique features of a DSD in performing the analysis, these factors must define a DSD or a fold-over design.
By
A column whose levels define separate analyses. For each level of the specified column, the corresponding rows are analyzed. The results appear in separate reports. If more than one By variable is assigned, a separate analysis is produced for each possible combination of the levels of the By variables.
Fit Definitive Screening Report
The Fit Definitive Screening report provides these outlines:
Stage 1 - Main Effect Estimates
The Main Effect Estimates report lists main effects that are identified as active. Main effects with p-values less than the threshold p-value are considered active. For additional details, see “Stage 1 - Main Effect Estimates”.
If fake factors or center point replicates are available, an estimator of error variance that is independent of the model is constructed. The main effects are tested against this estimate.
If no fake factors or center point replicates are available, subsets of main effects are tested sequentially against an estimate of error variance constructed from the inactive main effects. For this procedure to be viable, at least one of the main effects must be inactive.
In either case, variability from the inactive main effects is pooled into the error variance used to test the main effects.
Figure 8.8 Stage 1 Report
Stage 1 Report
Term
Main effects identified as active. These effects have p-values less than the threshold when tested as described in “Stage 1 Methodology”.
Estimate
Parameter estimate for a regression fit of Y on the main effects.
Std Error
The standard error of the estimate, computed using the Stage 1 RMSE.
t Ratio
The Estimate divided by its Std Error.
Prob>|t|
The p-value computed using the t Ratio and the degrees of freedom for error (DF).
RMSE
The square root of the mean square error that results from the Stage 1 analysis.
If fake factors or centerpoint replicates are available, the mean square error is the estimate of variance from fake factors and centerpoints pooled with the variance estimate constructed from the main effects that are not identified as active.
If no fake factors or centerpoint replicates are available, the mean square error is the estimate of variance constructed from the main effects that are not identified as active.
DF
The degrees of freedom associated with the error estimate used to construct RMSE.
If fake factors or centerpoint replicates are available, DF is the sum of the number of fake factors, centerpoint replicates, and main effects not identified as active.
If no fake factors or centerpoint replicates are available, DF is the number of main effects that are not identified as active.
Stage 2 - Even Order Effect Estimates
The Even Order Effect Estimates report lists second-order effects that are identified as active. Active second-order effects are identified using the guided variable selection procedure described in “Stage 2 Methodology”. The block effect (if one is included) is also listed, whether it is significant or not.
Figure 8.9 Stage 2 Report
Stage 2 Report
Term
The block factor and second-order effects identified as active.
Estimate
Parameter estimates for a regression fit of Y on the Stage 2 second-order effects defined by Y2nd. See “Decomposition of Response”.
Std Error
The standard error of the estimate, computed using the Stage 2 RMSE.
t Ratio
The Estimate divided by its Std Error.
Prob>|t|
The p-value computed using the t Ratio and the degrees of freedom for error (DF).
RMSE
The square root of the mean square error that results from the Stage 2 analysis. RMSE is estimated as the residual variance from Y2nd after fitting the second order effects identified as active. See “Decomposition of Response”.
DF
The degrees of freedom associated with the error estimate used to construct RMSE.
Combined Model Parameter Estimates
The Combined Model Parameter Estimates report lists the terms in the final model and their usual standard least squares estimates, standard errors, t ratios, p-values, RMSE, and model degrees of freedom.
Below the report are buttons that construct or run the combined model.
Figure 8.10 Combined Model Parameter Estimates Report
Combined Model Parameter Estimates Report
Make Model
Creates a model for the Fit Model window containing the model terms in the Combined Model Parameter Estimates report and the response specified for the Fit Definitive Screening analysis. The Standard Least Squares personality is specified.
Run Model
Runs a standard least squares fit for the model terms in the Combined Model Parameter Estimates report and the response specified for the Fit Definitive Screening analysis.
Main Effects Plot
Shows a plot of the response against each of the factors entered as X in the Fit Definitive Screening launch window. Notice that the block factor is not shown.
Prediction Profiler
Shows a prediction profiler for the main effects identified as active in the Stage 1 analysis. You can view a prediction profiler for the combined model terms in the report that you obtain by clicking the Run Model button. For details about the prediction profiler, see the Profiler chapter in the Profilers book.
Fit Definitive Screening Platform Options
See the JMP Reports chapter in the Using JMP book for more information about the following options:
Local Data Filter
Shows or hides the local data filter that enables you to filter the data used in a specific report.
Redo
Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.
Save Script
Contains options that enable you to save a script that reproduces the report to several destinations.
Save By-Group Script
Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.
Technical Details
The Effective Model Selection for DSDs Algorithm
This section provides a summary of the algorithm used in the Fit Definitive Screening platform. For further details, see Jones and Nachtsheim (2016).
Decomposition of Response
The Effective Model Selection algorithm expresses the response, Y, in terms of two responses YME and Y2nd, so that Y = YME + Y2nd.
YME is the predicted value obtained from a regression of Y on the main effects and fake factors.
There is no need to include the block factor in YME because of the fold-over structure of the design. The block factor is included in Y2nd.
Y2nd is given by Y2nd = Y - YME.
Note: In a DSD, the columns YME and Y2nd are orthogonal.
The analysis proceeds in two stages:
Stage 1: The response Y is used to identify main effects. Stage 1 identifies the main effects that are considered active.
Stage 2: The response Y2nd is used to identify second-order effects. Stage 2 considers all second-order terms in the active main effects from Stage 1 and determines a subset of these containing effects considered to be active.
Note: If there is a blocking factor, it is included in the Stage 2 list of effects even if it is not significant.
Stage 1 Methodology
The Stage 1 methodology depends on whether the design contains fake factors or centerpoint replicates.
Case 1: Fake Factors or Centerpoint Replicates Available
1. Using the fake factors or center point replicates, an estimator of error variance that is independent of the model is constructed. Assuming that there are no active third or higher odd order effects, this estimate is unbiased.
2. Using YME, main effects are tested against this estimate. Main effects with p-values less than a threshold p-value are considered active. The threshold values are the following:
For one error degree of freedom, the threshold value is 0.20.
For two error degrees of freedom, the threshold value is 0.10.
For more than two error degrees of freedom, the threshold value is 0.05.
3. If no main effect has a p-value less than the threshold value, conclude that there are no active main effects and no active two-factor effects. The procedure terminates.
4. If active main effects are found, then variability from the inactive main effects is pooled into the error variance constructed in (1).
Case 2: No Fake Factors or Centerpoint Replicates Available
In this case, there is no model-independent estimator of error variance available. Subsets of main effects are tested sequentially against an estimate of error variance constructed from the inactive main effects. Suppose that there are m main effects.
1. The absolute values of the estimated effects, using YME as the response, are ordered from largest to smallest.
2. For each 1  i < m, the effect with the ith largest absolute value is tested against the adjusted residual sum of squares for the model containing that effect and all effects with larger absolute values.
3. The effects in the model with the smallest p-value are considered to be the active effects.
4. If active main effects are found, then variability from the inactive main effects is used to construct an estimate of error variance, using YME as the response.
Note: For the Fit Definitive Screening procedure to work properly in Case 2, at least one of the main effects must be active and at least one must be inactive. If no main effects are active, or if all main effects are active, the procedure will identify a set of main effects, but the procedure for arriving at that subset is compromised.
Stage 2 Methodology
In Stage 2, only second-order effects involving the factors whose main effects are active are considered. Stage 2 uses a guided subset selection procedure. The goal is to continue to add second-order effects to the model as long as the RMSE from Stage 2 exceeds the RMSE from Stage 1. When the Stage 2 RMSE is less than or equal to the Stage 1 RMSE, this indicates that there are no additional second-order effects to add to the model.
The same threshold values are used as in Stage 1:
For one error degree of freedom, the threshold value is 0.20.
For two error degrees of freedom, the threshold value is 0.10.
For more than two error degrees of freedom, the threshold value is 0.05.
1. The variability for Y2nd is tested against the error estimate from Stage 1 to determine if there is additional variability due to second-order effects.
If the p-value for this test exceeds the threshold value the procedure terminates and no active second-order effects are identified.
2. If the p-value for this test is less than or equal to the threshold value, then subsets of size k, k = 1,2,3,... are successively tested, starting with k = 1.
3. For each k, the residual sum of squares for each subset of that size is tested against the error estimate from Stage 1. The subset with the smallest RMSE is identified.
4. The procedure continues until a k is found whose RMSE is smaller than the Stage 1 RMSE.
5. The effects in the subset preceding the one that corresponds to the terminal value of k are considered to be the active two-factor effects.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset