Index

  1. a priori algorithm
    1. association rules
      1. minimum confidence
      2. Modeler results
      3. one antecedent
      4. two antecedents
      5. two-step process
    2. frequent itemsets
  2. ADABoost algorithm
    1. final boosted classifier
    2. initial base classifier
    3. original dataset
    4. second base classifier
    5. third base classifier
  3. adjusted cost matrix
    1. bank loan
    2. equivalent cost
    3. false negative cost
    4. false positive cost
    5. retailer cost
  4. analysis of variance (ANOVA)
    1. Minitab results
    2. MSTR
    3. multiple regression model
    4. R code
    5. sample mean age
    6. sum of squares
  5. artificial neuron model
  6. association rules
    1. a priori property (see a priori algorithm)
    2. affinity analysis
    3. antecedent and consequent
    4. business and research
    5. categorical data
    6. confidence and support
    7. frequent itemsets
    8. J-measure
    9. lift ratio
    10. market basket analysis
    11. patterns and models
    12. R code
    13. strong rules
    14. supervised/unsupervised learning
    15. worst case scenario
  7. attribute-relation file format (ARFF) file
  8. back-propagation algorithm
    1. cross validation termination
    2. downstream node
    3. error propagation
    4. learning rate
    5. momentum term
    6. squared prediction error
    7. upstream node
  9. bagging model
    1. algorithm for
    2. bootstrap samples
    3. vs. CART model
    4. prediction method
    5. R code
    6. stable/unstable classification
  10. balanced iterative reducing and clustering using hierarchies (BIRCH) clustering
    1. bank loans data set
      1. cost matrix
      2. data sorting
      3. No Interest model
      4. With Interest model
    2. CF/CF tree
      1. Additivity Theorem
      2. algorithm
      3. building process
      4. clustering sub-clusters
      5. definition
      6. one-dimensional toy data set
      7. radius
      8. tree structure
    3. Modeler's two-step algorithm
    4. optimal number of clusters
    5. pseudo-F statistic method
    6. R code
    7. two-step clustering
  11. baseline model
    1. Captain Kirk's situation
    2. regression model
  12. Bayesian approach see also Naidotve Bayes classifier
    1. balancing data set
    2. drawbacks
    3. frequentist/classical approach
    4. likelihood function
    5. MAP method (see maximum a posteriori (MAP))
    6. marginal distribution
    7. MCMC methods
    8. posterior distribution
    9. posterior odds ratio
    10. prior distribution
    11. R code
  13. Bayesian belief networks (BBNs)
    1. clothing purchase
    2. conditional probability
    3. directed acyclic graph
    4. joint probability distribution
    5. prior probabilities
    6. WEKA
      1. Explorer Panel
      2. positive and negative classification
      3. prior probabilities
      4. test set predictions
  14. bias–variance trade-off
  15. boosting model
    1. ADABoost algorithm
      1. final boosted classifier
      2. initial base classifier
      3. original dataset
      4. second base classifier
      5. third base classifier
    2. vs. CART model
    3. R code
  16. C4.5 algorithm
    1. adult data set
    2. candidate splits
    3. capital gains
    4. categorical variables
    5. decision node A
    6. entropy reduction
    7. initial split
    8. marital status
    9. numerical variables
    10. savings split
    11. threshold partition
    12. training data set
  17. churn data set
    1. account length
    2. adult data set
    3. age predictor
    4. area code field
    5. balanced data set
    6. categorical variables
      1. clustered bar chart
      2. comparative pie chart
      3. directed web graph
      4. International Plan
      5. marginal distribution
      6. non-churners
      7. row percentages
      8. software packages
      9. two-way interaction
      10. voice mail plan
    7. clustering analysis
      1. CART decision trees
      2. churn proportion
      3. contingency tables
      4. international plan people
      5. no-plan majority
      6. voice mail plan people
    8. conditional independence
    9. continuous predictor (see continuous predictor)
    10. correlation coefficient
      1. account length
      2. matrix plot
      3. Minitab regression tool
      4. optimal solution
      5. p-values
      6. thresholds
    11. customer service calls
    12. data preparation
      1. contingency table
      2. HighDayEveMins_Flag variable
      3. voice mail messages
      4. z-score standardization
    13. day minutes
    14. dichotomous predictor (see dichotomous predictor)
    15. education-num variable
    16. field values
    17. flag variables
    18. hours-per-week
    19. income overlay
    20. International Plan
    21. maximum a posteriori
      1. complement probabilities
      2. conditional probability
      3. International Plan
      4. joint conditional probabilities
      5. marginal and conditional probabilities
      6. posterior probabilities
      7. Voice Mail Plan
    22. multivariate graphics
    23. numerical predictors
      1. binning methods
      2. churn proportion
      3. churners vs. non-churners
      4. customer service call
      5. International Calls
      6. normalized and non-normalized histogram
      7. t-test
    24. numerical variables
    25. polychotomous predictor (see polychotomous predictor)
    26. posterior odds ratio
    27. vs. variables
    28. visualization
    29. voice mail plan
    30. VoiceMail Plan adopters
  18. classification and regression trees (CART)
    1. adult data set
    2. bank loans
    3. candidate splits
    4. capital gains
    5. categorical variables
    6. classification error
    7. components
    8. contingency table
    9. cost matrix
    10. data-driven misclassification costs
    11. decision node A
    12. decision node B
    13. decision tree output
    14. estimated revenue increase
    15. evaluation measures
    16. initial split
    17. lift chart
    18. marital status
    19. maximum value
    20. numerical variables
    21. optimal split
    22. scaled cost matrix
    23. training data set
  19. cluster feature (CF)
    1. Additivity Theorem
    2. building process
    3. clustering sub-clusters
    4. definition
    5. one-dimensional toy data set
    6. radius
    7. tree structure
  20. cluster validation
    1. cross-validation
      1. loans data sets
      2. methodology
      3. prediction strength
      4. R code
    2. loans data sets
    3. methodology
    4. prediction strength
    5. pseudo-F statistic method
      1. clustering model
      2. distribution
      3. Iris data set
      4. R code
      5. SSB and SSE
    6. R code
    7. silhouette method
      1. cohesion/separation
      2. Iris data set
      3. mean silhouette
      4. positive/negative values
      5. R code
  21. clustering analysis
    1. CART decision trees
    2. churn proportion
    3. contingency tables
    4. definition
    5. hierarchical clustering
      1. agglomerative clustering
      2. complete-linkage clustering
      3. divisive clustering methods
      4. single-linkage clustering
    6. international plan people
    7. k-means clustering algorithm
      1. data points
      2. definition
      3. MSE
      4. processing steps
      5. pseudo-F statistic method
      6. SAS Enterpriser Miner (see churn data set)sub
      7. statistics behavior
    8. no-plan majority
    9. R code
    10. voice mail plan people
  22. confidence interval
    1. customer service call
    2. lower bound
    3. margin of error
    4. population proportion
    5. subgroup analyses
    6. t-interval
    7. upper bound
  23. continuous predictor
    1. categorical predictor
    2. confidence intervals
    3. day minute usage
    4. deviance
    5. p-value
    6. test statistics
    7. unit-increase interpretation
  24. Cook's distance
  25. correlation coefficient
    1. account length
    2. matrix plot
    3. Minitab regression tool
    4. optimal solution
    5. p-values
    6. PCA
    7. thresholds
  26. cost-benefit analysis
    1. CART model
      1. contingency table
      2. cost matrix
      3. estimated revenue increase
      4. evaluation measures
      5. scaled cost matrix
    2. cost matrix
    3. decision invariance
      1. binary classifier
      2. scaling
    4. direct cost
    5. k-nary classification
      1. accuracy
      2. contingency table
      3. Loans data sets
      4. overall error rate
      5. predicted/actual categories
      6. sensitivity
    6. Loans data set
      1. adjusted cost matrix
      2. assumptions
      3. CART model
      4. direct cost matrix
      5. simplified cost matrix
      6. strategies
    7. opportunity cost
    8. positive classification
      1. adjusted cost matrix
      2. C5.0 models
    9. R code
    10. rebalancing cost
      1. CART model
      2. confidence and positive confidence
      3. definition
      4. network models
    11. trinary classification
      1. accuracy
      2. assumptions
      3. contingency table
      4. cost calculation
      5. cost matrix
      6. false negative
      7. false positive
      8. number of customers
      9. number of records
      10. overall error rate
      11. predicted/actual categories
      12. principal and interest
      13. true negative
      14. true positive
  27. cross-industry standard process for data mining (CRISP-DM)
    1. adaptive process
    2. business understanding phase
    3. business/research phase
    4. clustering analysis
      1. BIRCH clustering algorithm
      2. cluster profiles
      3. cross-validation
      4. k-means clustering
    5. data phase
    6. data preparation phase
      1. deriving flag variable
      2. negative amounts
      3. product uniformity
      4. standardization
    7. data understanding phase
      1. absolute pairwise correlation
      2. continuous predictors
      3. dataset, fields
      4. de-transformation
      5. lifestyle cluster types
      6. missing values
      7. predictors and response
      8. zip code fields
    8. deployment phase
    9. evaluation phase
    10. modeling and evaluation strategy
      1. baseline model
      2. cost-benefit analysis
      3. high performance model
      4. input variables
      5. misclassification cost
      6. model voting
      7. processing steps
      8. profitable classification model
      9. propensity averaging
      10. rebalanced data set
    11. modeling phase
    12. principal components analysis
      1. data set partitioning
      2. input variables
      3. low communality predictors
      4. principal component profiles
      5. rotated component matrix
  28. cross-validation
  29. customer service calls (CSC) see polychotomous predictor
  30. data balancing
  31. data cleaning
    1. age field
    2. American zip code
    3. data set
    4. income field
    5. marital status field
    6. measures of center
      1. customer service calls
      2. measures of location
      3. measures of spread
      4. price/earning ratio
      5. standard deviation
    7. missing data
      1. data imputation method
      2. field values
      3. frequency distribution
      4. random values
      5. replacement values
      6. variable brand
    8. outliers
    9. poverty
    10. R code
    11. transaction amount field
  32. data imputation method
  33. data preparation
    1. contingency table
    2. HighDayEveMins_Flag variable
    3. voice mail messages
    4. z-score standardization
  34. data summarization
    1. bivariate relationship
    2. boxplot
    3. discrete variable
    4. levels of measurement
    5. measures of center
    6. measures of position
    7. measures of variability
    8. qualitative/quantitative variable
  35. data transformation
    1. binning methods
    2. categorical variables
      1. reclassification
      2. region_num variable
      3. survey_response variable
    3. correlated variables
    4. decimal scaling
    5. donation_dollar field
    6. duplicate records
    7. flag variables
    8. ID fields
    9. index field
    10. min–max normalization
    11. R code
    12. unary variables
    13. Z-score standardization
      1. inverse_sqrt (weight) transformation
      2. natural log transformation
      3. negative standardization
      4. normal probability plot
      5. normal Z distribution
      6. outliers
      7. positive standardization
      8. skewness
      9. square root transformation
      10. weighted data
  36. data visualization
    1. bar chart
    2. bivariate relationship
    3. cumulative frequency distribution
    4. dotplot
    5. frequency distribution
    6. histogram
    7. pie chart
    8. skewness
    9. stem-and-leaf display
  37. data-driven misclassification costs see cost-benefit analysis
  38. decision tree
    1. C4.5 algorithm, information-gain
      1. adult data set
      2. candidate splits
      3. capital gains
      4. categorical variables
      5. decision node A
      6. entropy reduction
      7. initial split
      8. marital status
      9. numerical variables
      10. savings split
      11. threshold partition
      12. training data set
    2. CART (see Classification and regression trees (CART))
    3. credit risk
    4. decision rules
    5. diverse attributes
    6. R code
    7. requirements
  39. dichotomous predictor
    1. reference cell coding
    2. voice mail plan
  40. dimension-reduction method
    1. applications
    2. factor analysis (see factor analysis)
    3. houses data set
      1. median income
      2. predictor variables
    4. multicollinearity
    5. PCA (see principal components analysis (PCA))
    6. R code
    7. user-defined composites
      1. definition
      2. houses data set
      3. measurement error
      4. summated scales
  41. direct cost matrix
  42. distance function
    1. age variable
    2. Euclidean distance
    3. min–max normalization
    4. properties
    5. Z-score standardization
  43. EDA see exploratory data analysis (EDA)
  44. ensemble methods
    1. bagging model
      1. algorithm for
      2. bootstrap samples
      3. vs. CART model
      4. prediction method
      5. R code
      6. stable/unstable classification
    2. bias-variance trade-off
    3. boosting model
      1. adaptive boosting (see ADABoost algorithm)sub
      2. algorithm for
      3. vs. CART model
      4. R code
    4. model voting
      1. alternative models
      2. contingency tables
      3. evaluative measures
      4. majority classification
      5. processing steps
      6. R code
      7. working test data set
    5. prediction error
    6. propensity averaging
      1. evaluative measures
      2. histogram model
      3. m base classifiers
      4. processing steps
  45. exploratory data analysis (EDA)
    1. churn data set (see churn data set)
    2. data understanding phase
      1. absolute pairwise correlation
      2. de-transformation
      3. predictors and response
    3. vs. hypothesis testing
    4. R code
    5. segmentation modeling
      1. capital gains/losses
      2. contingency tables
      3. overall error rate
  46. factor analysis model
    1. adult data set
      1. Bartlett's test
      2. correlation matrix
      3. factor loadings
      4. KMO statistics
      5. principal axis
    2. factor rotation
      1. oblique rotation method
      2. orthogonal rotation
      3. percentage of variance
      4. rotated vectors
      5. unrotated vectors
      6. varimax rotation
  47. flag variables
  48. GAs see genetic algorithms (GAs)
  49. gas mileage prediction
    1. backward elimination
    2. best subsets method
    3. forward selection method
    4. Mallows' Cp statistics
      1. predictors
      2. regression assumptions
    5. stepwise selection regression
    6. target variable MPG
  50. generalized rule induction (GRI) method
  51. genetic algorithms (GAs)
    1. crossover operator
      1. definition
      2. multi-point crossover
      3. real-valued data
      4. uniform crossover
    2. framework
    3. mutation operator
    4. neural networks
      1. backpropagation
      2. feed-forward nature
      3. learning method
      4. modified discrete crossover
      5. random shock mutation
      6. sum of squared errors
      7. topology and operation
    5. R code
    6. selection operator
      1. Boltzmann selection
      2. crowding phenomenon
      3. definition
      4. elitism
      5. fitness sharing
      6. rank selection
      7. sigma scaling
      8. tournament ranking
    7. terminologies
    8. WEKA
      1. AttributeSelectiedClassifier
      2. class distribution
      3. initial population characteristics
      4. Preprocess tab
      5. WrapperSubsetEval evaluation method
  52. gradient-descent method
  53. graphical evaluation
    1. gains charts
    2. lift chart
    3. profits charts
    4. R code
    5. response charts
    6. return-on-investment charts
  54. hierarchical clustering
    1. agglomerative clustering
    2. complete-linkage clustering
    3. divisive clustering methods
    4. single-linkage clustering
  55. hypothesis testing
    1. confidence interval
    2. criminal trial, outcomes
    3. null hypothesis
    4. p-value
    5. population proportion
    6. standard error
    7. treatment
  56. indicator variable
    1. cereals, y-intercepts
    2. estimated nutritional rating
    3. p-values
    4. parallel planes
    5. reference category
    6. regression coefficient values
    7. relative estimation error
    8. shelf effect
  57. instance-based learning
    1. issues
    2. sodium/potassium ratio
    3. training data points
    4. voting
  58. k-means clustering algorithm
    1. data points
    2. definition
    3. MSE
    4. processing steps
    5. pseudo-F statistic method
    6. SAS Enterpriser Miner (see churn data set)
    7. statistics behavior
  59. k-nary classification
    1. accuracy
    2. contingency table
    3. Loans data sets
    4. overall error rate
    5. predicted/actual categories
    6. sensitivity
  60. k-nearest neighbor (KNN) algorithm
    1. classification
      1. data set
      2. income bracket
    2. ClassifyRisk data set
    3. combination function
      1. simple unweighted voting
      2. weighted voting
    4. cross-validation approach
    5. database
    6. distance function
      1. age variable
      2. Euclidean distance
      3. min–max normalization
      4. properties
      5. Z-score standardization
    7. instance-based learning
      1. issues
      2. sodium/potassium ratio
      3. training data points
      4. voting
    8. locally weighted averaging
    9. modeler's results
    10. outliers/unusual observations
    11. R code
  61. Kaiser–Meyer–Olkin (KMO) statistics
  62. Kohonen networks
    1. age and income data set
    2. algorithm
    3. CART decision tree model
    4. cluster profiles
    5. flag variables
    6. International Plan adopters
    7. mean analysis
    8. numerical variables
    9. R code
    10. SOM
      1. architecture
      2. characteristic processes
      3. goal
      4. networks connection
    11. topology
    12. validation
    13. variables distribution
    14. VoiceMail Plan adoption
  63. logistic regression model
    1. conditional mean
    2. disease vs. age
    3. linear regression model
    4. logit transformation
    5. maximum-likelihood estimation
      1. confidence interval
      2. interpretation
      3. likelihood ratio test
      4. log-likelihood estimators
      5. mean square regression
      6. negative response
      7. parameters
      8. positive response
      9. saturated model
      10. Wald test, parameters
    6. odds ratio (see odds ratio (OR))
    7. R code
    8. sigmoidal curve
    9. training data set
      1. education variable
      2. marital status
    10. WEKA
      1. explorer panel
      2. RATING field
      3. regression coefficients
      4. test set prediction
      5. training file
  64. market basket analysis
  65. Markov chain Monte Carlo (MCMC) methods
  66. maximum a posteriori (MAP), churn data set
    1. complement probabilities
    2. conditional probability
    3. International Plan
    4. joint conditional probabilities
    5. marginal and conditional probabilities
    6. posterior probabilities
    7. Voice Mail Plan
  67. McKinsey Global Institute (MGI) report
    1. association task
    2. classification
      1. income bracket
      2. sodium/potassium ratio
    3. clustering
    4. continuous quality monitoring
    5. CRISP-DM
      1. adaptive process
      2. business/research phase
      3. data phase
      4. deployment phase
      5. evaluation phase
      6. modeling phase
    6. estimation model
    7. factors
    8. Forbes magazine
    9. HMO
    10. patterns and trends
    11. prediction
    12. problem solving, human process
    13. profitable results
    14. R code
    15. software packages
    16. tools
  68. mean absolute error (MAE)
  69. mean square error (MSE)
  70. mean square treatment (MSTR)
  71. missing data imputation
    1. CART model
    2. data weighting
    3. flag variable
    4. multiple regression model
    5. R code
    6. SEI formula
  72. model evaluation techniques
    1. classification task
      1. accuracy
      2. building and data model
      3. C5.0 model
      4. contingency table
      5. cost/benefit analysis
      6. error rate
      7. false negative
      8. false-negative rate
      9. false-positive
      10. false-positive rate
      11. financial lending firm
      12. gains chart
      13. income classification
      14. lift charts
      15. misclassification cost adjustment
      16. true negative
      17. true positive
    2. description task
    3. estimation and prediction tasks
      1. MAE
      2. MSE
      3. standard error of the estimate
    4. R code
  73. model voting process
    1. alternative models
    2. contingency tables
    3. evaluative measures
    4. majority classification
    5. processing steps
    6. R code
    7. working test data set
  74. multicollinearity
    1. correlation coefficients
    2. fiber variable
    3. matrix plot
    4. potassium variable
    5. stability coefficient
    6. user-defined composite
    7. variable coefficients
    8. variance inflation factor
  75. multinomial data
    1. chi-square test
    2. expected frequency
    3. observed frequency
    4. R code
    5. test statistics
  76. multiple regression model
    1. ANOVA table
    2. coefficient of determination, R2
    3. confidence interval
      1. mean value, y
      2. particular coefficient, βi
    4. estimation error
    5. indicator variable
      1. cereals, y-intercepts
      2. estimated nutritional rating
      3. p-values
      4. parallel planes
      5. reference category
      6. regression coefficient values
      7. relative estimation error
      8. shelf effect
    6. inference
      1. F-test
      2. t-test
    7. multicollinearity
      1. correlation coefficients
      2. fiber variable
      3. matrix plot
      4. potassium variable
      5. stability coefficient
      6. user-defined composite
      7. variable coefficients
      8. variance inflation factor
    8. nutritional rating vs. sugars
    9. population
    10. prediction interval
    11. predictor variables
    12. principal components
      1. Box–Cox transformation
      2. component values
      3. unrotated and rotated component weights
      4. varimax-rotated solution
    13. R code
    14. regression plane/hyperplane
    15. slope coefficients
    16. Spoon Size Shredded Wheat
    17. SSR
    18. three-dimensional scatter plot
    19. variable selection method (see variable selection method)
  77. Naidotve Bayes classifier see also Bayesian approach
    1. conditional independence
    2. posterior odds ratio
    3. predictor variables
    4. WEKA
      1. ARFF
      2. conditional probabilities
      3. Explorer Panel
      4. load training file
      5. test set predictions
    5. zero-frequency cells
  78. neural network model
    1. adult data set
    2. artificial neuron model
    3. back-propagation algorithm
      1. cross validation termination
      2. downstream node
      3. error propagation
      4. learning rate
      5. momentum term
      6. squared prediction error
      7. upstream node
    4. combination function
    5. data preprocessing
    6. estimation and prediction
    7. gradient-descent method
    8. hidden layer
    9. input and output encoding
      1. categorical variables
      2. dichotomous classification
      3. drawback
      4. min–max normalization
      5. thresholds
    10. input layer
    11. output layer
    12. prediction accuracy
    13. R code
    14. real neuron
    15. sensitivity analysis
    16. sigmoid function
  79. neural networks
    1. backpropagation
    2. feed-forward nature
    3. learning method
    4. modified discrete crossover
    5. random shock mutation
    6. sum of squared errors
    7. topology and operation
  80. odds ratio (OR)
    1. assumptions
      1. capnet variable
      2. churn overlay
      3. customer service calls
    2. continuous predictor (see continuous predictor)
    3. dichotomous predictor (see dichotomous predictor)
    4. estrogen replacement therapy
    5. interpretation
    6. polychotomous predictor (see polychotomous predictor)
    7. relative risk
    8. response variable
    9. zero-count cell
  81. overfitting
    1. complexity model
    2. provisional model
  82. partitioning variable
  83. PCA see Principal components analysis (PCA)
  84. polychotomous predictor
    1. confidence interval
    2. estimated probability
    3. medium customer service call
    4. reference cell encoding
    5. standard error
    6. Wald test
  85. principal components analysis (PCA)
    1. communality
    2. component matrix
    3. component size
    4. component weights
    5. coordinate system
    6. correlation coefficient
    7. correlation matrix
    8. covariance matrix
    9. data set partitioning
    10. eigenvalues
    11. eigenvectors
    12. geographical component
    13. housing median age
    14. input variables
    15. linear combination
    16. low communality predictors
    17. matrix plot
    18. median income
    19. multiple regression analysis
    20. orthogonal vectors
    21. principal component profiles
    22. rotated component matrix
    23. scree plot
    24. standard deviation matrix
    25. validation
    26. variance proportion
  86. profits charts
  87. propensity averaging process
    1. evaluative measures
    2. histogram model
    3. m base classifiers
    4. processing steps
  88. pseudo-F statistic method
    1. clustering model
    2. distribution
    3. Iris data set
    4. R code
    5. SSB and SSE
  89. regression modeling
    1. ANOVA table
    2. baseline model
    3. Box–Cox transformation
    4. cereals data set
    5. coefficient of determination, r2
      1. data points
      2. distance and time estimation
      3. estimation error
      4. maximum value
      5. minimum value
      6. predicted score column
      7. prediction error
      8. predictor and response variables
      9. predictor information
      10. residual error
      11. sample variance
      12. standard deviation
      13. sum of squares regression
      14. sum of squares total
    6. Cook's distance
    7. correlation coefficient, r
      1. confidence interval
      2. linear correlation
      3. negative correlation
      4. positive correlation
      5. quantitative variables
    8. dangers of extrapolation
      1. chocolate frosted sugar bombs
      2. observed and unobserved points
      3. policy recommendations
      4. prediction error
      5. predictor variable
    9. end-user
      1. confidence interval
      2. prediction interval
    10. field values
    11. high leverage point
      1. characteristics
      2. distance vs. time
      3. hard-core orienteer
      4. mild outlier
      5. observation
      6. regression results
      7. standard error
    12. inference
    13. least-squares estimation
      1. error term epsiv
      2. estimated nutritional rating
      3. nutritional rating vs. sugar content
      4. prediction error
      5. statistics
      6. sum of squared errors
      7. y-intercept b0
    14. linearity transformation
      1. bulging rule
      2. log transformation
      3. point value vs. letter frequency
      4. response variable
      5. Scrabble®
      6. square root transformation
      7. standardized residual
    15. normal probability plot
      1. Anderson–Darling (AD) statistics
      2. assumptions
      3. chi-square distribution
      4. distance vs. time
      5. horizontal zero line
      6. normal distribution
      7. p-value
      8. Rorschach effect
      9. uniform distribution
    16. outliers
      1. Minitab
      2. nutritional rating vs. sugars
      3. positive and negative values
      4. standardized residuals
    17. population regression equation
      1. assumptions
      2. bivariate observation
      3. constant variance
      4. true regression line
    18. R code
    19. regression equation
    20. standard error
      1. mean square error
      2. standard deviation, response variable
      3. sum of squares regression
      4. sum of squares total
      5. time and distance calculation
    21. t-test
      1. assumptions
      2. confidence interval
      3. null hypothesis
      4. nutritional rating vs. sugar content
      5. p-value method
      6. sampling distribution
  90. response charts
  91. return-on-investment (ROI) charts
  92. scatter plot
  93. segmentation modeling
    1. clustering analysis
      1. CART decision trees
      2. churn proportion
      3. contingency tables
      4. international plan people
      5. no-plan majority
      6. voice mail plan people
    2. exploratory analysis
      1. capital gains/losses
      2. contingency tables
      3. overall error rate
    3. performance enhancement
    4. processing steps
    5. R code
  94. SEI see standard error of the imputation (SEI)
  95. self-organizing map (SOM)
    1. architecture
    2. characteristic processes
    3. goal
    4. networks connection
  96. sigmoid function
  97. silhouette method
    1. cohesion/separation
    2. Iris data set
    3. mean silhouette
    4. positive/negative values
    5. R code
  98. simplified cost matrix
  99. squashing function
  100. standard error of the imputation (SEI)
  101. statistical inference
    1. confidence interval
      1. customer service call
      2. lower bound
      3. margin of error
      4. population proportion
      5. subgroup analyses
      6. t-interval
      7. upper bound
    2. crystal ball gazers
    3. definition
    4. hypothesis testing (see hypothesis testing)
    5. point estimation
    6. population parameters
    7. R code
    8. sample proportion
    9. sampling error
  102. statistical methods
  103. stem-and-leaf display
  104. sum of squares between (SSB)
  105. sum of squares error (SSE)
  106. sum of squares regression (SSR), multiple regression model
  107. supervised methods
  108. target variable
  109. unsupervised methods
  110. user-defined composites
    1. definition
    2. houses data set
    3. measurement error
    4. summated scales
  111. variable selection method
    1. all-possible-regression
    2. backward elimination
    3. best subsets method
    4. forward selection
    5. gas mileage data set (see gas mileage prediction)
    6. partial F-test
    7. stepwise regression
  112. Waikato Environment for Knowledge Analysis (WEKA)
  1. Bayesian belief networks
    1. Explorer Panel
    2. positive and negative classification
    3. prior probabilities
    4. test set predictions
  2. explorer panel
  3. genetic search algorithm
    1. AttributeSelectiedClassifier
    2. class distribution
    3. initial population characteristics
    4. Preprocess tab
    5. WrapperSubsetEval
  4. Naidotve Bayes
    1. ARFF
    2. conditional probabilities
    3. Explorer Panel
    4. load training file
    5. test set predictions
  5. RATING field
  6. regression coefficients
  7. test set prediction
  8. training file
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset