Index
A
- A/B testing, A/B Testing-For Further Reading
- accuracy, Evaluating Classification Models
- Adaboost, Boosting
- adjusted R-squared, Assessing the Model
- adjustment of p-values, Multiple Testing, Multiple Testing
- agglomerative algorithm, The Agglomerative Algorithm
- AIC (Akaike's Information Criteria), Model Selection and Stepwise Regression, Selecting the Number of Clusters
- Akike, Hirotugu, Model Selection and Stepwise Regression
- all subset regression, Model Selection and Stepwise Regression
- alpha, Statistical Significance and P-Values, Alpha
- alternative hypothesis, Hypothesis Tests, Alternative Hypothesis
- American Statistical Association (ASA), statement on p-values, Value of the p-value
- anomaly detection, Outliers, Regression and Prediction
- ANOVA (analysis of variance
- ANOVA (analysis of variance), ANOVA-Further Reading
- arms (multi-arm bandits), Multi-Arm Bandit Algorithm
- AUC (area under the ROC curve), AUC
- average linkage, Measures of Dissimilarity
B
- backward elimination, Model Selection and Stepwise Regression
- backward selection, Model Selection and Stepwise Regression
- bagging, The Bootstrap, Resampling, Statistical Machine Learning, Bagging
- bandit algorithms, Multi-Arm Bandit Algorithm
- (see also multi-arm bandits)
- bar charts, Exploring Binary and Categorical Data
- Bayesian classification, Naive Bayes
- Bayesian infomation criteria (BIC), Model Selection and Stepwise Regression, Selecting the Number of Clusters
- beta distribution, Multi-Arm Bandit Algorithm
- bias, Bias
- bias-variance tradeoff, Choosing K
- biased estimates, Standard Deviation and Related Estimates
- BIC (Bayesian information criteria), Model Selection and Stepwise Regression, Selecting the Number of Clusters
- bidirectional alternative hypothesis, One-Way, Two-Way Hypothesis Test
- big data
- binary data, Elements of Structured Data
- binomial, Binomial Distribution
- binomial distribution, Binomial Distribution-Further Reading
- binomial trials, Binomial Distribution
- bins
- bivariate analysis, Exploring Two or More Variables
- black swan theory, Long-Tailed Distributions
- blind studies, Why Have a Control Group?
- boosting, Statistical Machine Learning, Tree Models, Boosting-Summary
- bootstrap, The Bootstrap-Further Reading, Resampling
- bootstrap sample, The Bootstrap
- boxplots, Exploring the Data Distribution
- Breiman, Leo, Statistical Machine Learning
- bubble plots, Influential Values
C
- categorical data, Elements of Structured Data
- categorical variables, Factor Variables in Regression
- (see also factor variables)
- causation, regression and, Prediction versus Explanation (Profiling)
- central limit theorem, Sampling Distribution of a Statistic, Central Limit Theorem, Student’s t-Distribution
- chi-square distribution, Chi-Square Test: Statistical Theory
- chi-square statistic, Chi-Square Test
- chi-square test, Chi-Square Test-Further Reading
- class purity, Measuring Homogeneity or Impurity
- classification, Classification-Summary
- discriminant analysis, Discriminant Analysis-Further Reading
- evaluating models, Evaluating Classification Models-Further Reading
- AUC metric, AUC
- confusion matrix, Confusion Matrix
- lift, Lift
- precision, recall, and specificity, Precision, Recall, and Specificity
- rare class problem, The Rare Class Problem
- ROC curve, ROC Curve
- K-Nearest Neighbors, K-Nearest Neighbors
- logistic regression, Logistic Regression-Further Reading
- more than two possible outcomes, Classification
- naive Bayes algorithm, Naive Bayes-Further Reading
- strategies for imbalanced data, Strategies for Imbalanced Data-Further Reading
- unsupervised learning as building block, Unsupervised Learning
- cluster mean, K-Means Clustering, A Simple Example, Interpreting the Clusters
- clustering, Unsupervised Learning
- application to cold-start problems, Unsupervised Learning
- cluster analysis vs. PCA, Interpreting the Clusters
- hierarchical, Hierarchical Clustering-Measures of Dissimilarity, Categorical Data and Gower’s Distance
- K-means, K-Means Clustering-Selecting the Number of Clusters, Scaling the Variables
- model-based, Model-Based Clustering-Further Reading
- problems with mixed data, Problems with Clustering Mixed Data
- standardizing data, Standardization (Normalization, Z-Scores)
- clusters, K-Means Clustering
- coefficient of determination, Assessing the Model
- coefficients
- complete linkage, The Agglomerative Algorithm
- complexity parameter (cp), Stopping the Tree from Growing
- conditional probabilities, Naive Bayes
- conditioning variables, Visualizing Multiple Variables
- confidence intervals, Confidence Intervals-Further Reading, Confidence and Prediction Intervals
- confidence level, Confidence Intervals-Confidence Intervals
- confounding variables, Interpreting the Regression Equation, Confounding Variables
- confusion matrix, Evaluating Classification Models-Confusion Matrix
- contingency tables, Exploring Two or More Variables
- continuous data, Elements of Structured Data
- contour plots, Exploring Two or More Variables
- contrast coding systems, Dummy Variables Representation
- control group, A/B Testing
- Cook's distance, Influential Values
- correlated variables, Interpreting the Regression Equation
- correlation, Correlation-Further Reading
- correlation coefficient, Correlation
- correlation matrix, Correlation
- example, correlation between telecommunication stock returns, Correlation
- cost-based classification, Cost-Based Classification
- count data
- covariance, Discriminant Analysis, Covariance Matrix, Computing the Principal Components
- covariance matrix
- cross-validation, Cross-Validation, Choosing K
- cumulative gains charts, Lift
D
- d.f. (degrees of freedom), Degrees of Freedom, Chi-Square Test
- (see also degrees of freedom)
- data analysis, Exploratory Data Analysis
- (see also exploratory data analysis)
- data distribution, Exploring the Data Distribution-Further Reading, Sampling Distribution of a Statistic
- data frames, Rectangular Data
- data generation, Strategies for Imbalanced Data, Data Generation
- data snoopng, Selection Bias
- data types
- database normalization, Standardization (Normalization, Z-Scores)
- decile gains charts, Lift
- decision trees, The Bootstrap, Statistical Machine Learning
- decomposition of variance, ANOVA, F-Statistic
- degrees of freedom, Standard Deviation and Related Estimates, Student’s t-Distribution, Degrees of Freedom-Further Reading
- dendrograms, Hierarchical Clustering
- density plots, Exploring the Data Distribution, Density Estimates
- dependent variable, The Regression Equation
- deviation coding, Factor Variables in Regression, Dummy Variables Representation
- deviations, Estimates of Variability
- directional alternative hypothesis, One-Way, Two-Way Hypothesis Test
- discrete data, Elements of Structured Data
- discriminant analysis, Discriminant Analysis-Further Reading
- discriminant function, Discriminant Analysis
- discriminant weights, Discriminant Analysis
- dispersion, Estimates of Variability
- (see also variability, estimates of)
- dissimilarity, Hierarchical Clustering
- distance metrics, K-Nearest Neighbors, Hierarchical Clustering
- Donoho, David, Exploratory Data Analysis
- double blind studies, Why Have a Control Group?
- dummy variables, Factor Variables in Regression
- Durbin-Watson statistic, Heteroskedasticity, Non-Normality and Correlated Errors
E
- EDA (see exploratory data analysis)
- effect size, Power and Sample Size, Sample Size
- elbow method, Selecting the Number of Clusters
- ensemble learning, Statistical Machine Learning
- ensemble models, Boosting
- entropy, Measuring Homogeneity or Impurity
- epsilon-greedy algorithm, Multi-Arm Bandit Algorithm
- errors, Normal Distribution
- estimates, Estimates of Location
- Euclidean distance, Distance Metrics
- exact tests, Exhaustive and Bootstrap Permutation Test
- Excel, pivot tables, Two Categorical Variables
- exhaustive permutation tests, Exhaustive and Bootstrap Permutation Test
- expectation or expected, Chi-Square Test
- expected value, Exploring Binary and Categorical Data, Expected Value
- explanation vs. prediction (in regression), Prediction versus Explanation (Profiling)
- exploratory data analysis, Exploratory Data Analysis-Summary
- Exploratory Data Analysis (Tukey), Exploratory Data Analysis
- exponential distribution, Poisson and Related Distributions
- extrapolation
F
- F-statistic, ANOVA, F-Statistic, Assessing the Model
- facets, Visualizing Multiple Variables
- factor variables, Factor Variables in Regression-Ordered Factor Variables
- factors, conversion of text columns to, Elements of Structured Data
- failure rate, estimating, Estimating the Failure Rate
- false discovery rate, Multiple Testing, Multiple Testing
- false positive rate, AUC
- feature selection
- features, Rectangular Data
- field view (spatial data), Nonrectangular Data Structures
- Fisher's exact test, Fisher’s Exact Test
- Fisher's linear discriminant, Fisher’s Linear Discriminant
- Fisher's scoring, Fitting the model
- Fisher, R.A., Fisher’s Exact Test, Discriminant Analysis
- fitted values, Simple Linear Regression, Fitted Values and Residuals
- folds, Cross-Validation, Hyperparameters and Cross-Validation
- forward selection and backward selection, Model Selection and Stepwise Regression
- frequency tables, Exploring the Data Distribution
- Friedman, Jerome H. (Jerry), Statistical Machine Learning
G
- gains, Lift
- Gallup Poll, Random Sampling and Sample Bias
- Gallup, George, Random Sampling and Sample Bias, Random Selection
- Galton, Francis, Regression to the Mean
- GAM (see generalized additive models)
- Gaussian distribution, Normal Distribution
- (see also normal distribution)
- generalized additive models, Polynomial and Spline Regression, Generalized Additive Models, Exploring the Predictions
- generalized linear model (GLM), Logistic Regression and the GLM
- Gini coefficient, Measuring Homogeneity or Impurity
- Gini impurity, Measuring Homogeneity or Impurity
- GLM (see generalized linear model)
- Gosset, W.S., Student’s t-Distribution
- Gower's distance, Scaling and Categorical Variables
- gradient boosted trees, Interactions and Main Effects
- gradient boosting, The Boosting Algorithm
- graphs, Nonrectangular Data Structures
- greedy algorithms, Multi-Arm Bandit Algorithm
H
- hat notation, Fitted Values and Residuals
- hat-value, Testing the Assumptions: Regression Diagnostics, Influential Values
- heat maps, Hexagonal Binning and Contours (Plotting Numeric versus Numeric Data)
- heteroskedastic errors, Heteroskedasticity, Non-Normality and Correlated Errors
- heteroskedasticity, Testing the Assumptions: Regression Diagnostics, Heteroskedasticity, Non-Normality and Correlated Errors
- hexagonal binning, Exploring Two or More Variables
- hierarchical clustering, Hierarchical Clustering-Measures of Dissimilarity, Categorical Data and Gower’s Distance
- histograms, Exploring the Data Distribution
- homogeneity, measuring, Measuring Homogeneity or Impurity
- hyperparameters
- hypothesis tests, Hypothesis Tests-Further Reading
I
- impurity, Tree Models
- in-sample methods to assess and tune models, Model Selection and Stepwise Regression
- independent variables, Simple Linear Regression, The Regression Equation
- indexes, data frames and, Data Frames and Indexes
- indicator variables, Factor Variables in Regression
- inference, Exploratory Data Analysis, Statistical Experiments and Significance Testing
- influence plots, Influential Values
- influential values, Testing the Assumptions: Regression Diagnostics, Influential Values
- information, Measuring Homogeneity or Impurity
- interactions, Interpreting the Regression Equation
- intercepts, Simple Linear Regression
- Internet of Things (IoT), Elements of Structured Data
- interquantile range (IQR), Estimates of Variability, Estimates Based on Percentiles
- interval endpoints, Confidence Intervals
K
- K (in K-Nearest Neighbors), K-Nearest Neighbors
- k-fold cross-validation, Cross-Validation
- K-means clustering, K-Means Clustering-Selecting the Number of Clusters
- K-Nearest Neighbors, Predicted Values from Logistic Regression, K-Nearest Neighbors-KNN as a Feature Engine
- kernel density estimates, Density Estimates
- KernSmooth package, Density Estimates
- KNN (see K-Nearest Neighbors)
- knots, Polynomial and Spline Regression, Splines
- kurtosis, Frequency Table and Histograms
L
- lambda, in Poisson and related distributions, Poisson and Related Distributions
- Lasso regression, Model Selection and Stepwise Regression, Regularization: Avoiding Overfitting
- Latent Dirichlet Allocation (LDA), Discriminant Analysis
- leaf, Tree Models
- least squares, Simple Linear Regression, Least Squares
- leverage, Testing the Assumptions: Regression Diagnostics
- lift, Evaluating Classification Models, Lift
- lift curve, Lift
- linear discriminant analysis (LDA), Discriminant Analysis, Exploring the Predictions
- linear regression, Simple Linear Regression-Weighted Regression
- Literary Digest poll of 1936, Random Sampling and Sample Bias, Random Selection
- loadings, Principal Components Analysis, A Simple Example
- log odds, Logistic Regression
- log-odds function (see logit function)
- log-odds ratio, Interpreting the Coefficients and Odds Ratios
- logistic regression, Logistic Regression-Further Reading, Exploring the Predictions
- logit function, Logistic Regression, Logistic Response Function and Logit
- long-tail distributions, Long-Tailed Distributions-Further Reading
- loss, Tree Models
- loss function, Oversampling and Up/Down Weighting
M
- machine learning
- machine learnng, Statistical Machine Learning
- (see also statistical machine learning)
- Mahalanobis distance, Covariance Matrix, Distance Metrics
- main effects, Interpreting the Regression Equation
- Mallows Cp, Model Selection and Stepwise Regression
- Manhattan distance, Distance Metrics, Regularization: Avoiding Overfitting, Categorical Data and Gower’s Distance
- maximum likelihood estimation (MLE), Fitting the model
- mean, Estimates of Location
- mean absolute deviation, Estimates of Variability, A/B Testing
- mean absolute deviation from the median (MAD), Standard Deviation and Related Estimates
- median, Estimates of Location
- median absolute deviation, Estimates of Variability
- metrics, Estimates of Location
- minimum variance, Measures of Dissimilarity
- MLE (see maximum likelihood estimation)
- mode, Exploring Binary and Categorical Data
- examples in categorical data, Mode
- model-based clustering, Model-Based Clustering-Further Reading
- moments, Frequency Table and Histograms
- multi-arm bandits, Why Just A/B? Why Not C, D…?, Multi-Arm Bandit Algorithm-Further Reading
- multicollinearity, Interpreting the Regression Equation, Multicollinearity
- multicollinearity errors, Degrees of Freedom, Dummy Variables Representation
- multiple linear regression (see linear regression)
- multiple testing, Multiple Testing-Further Reading
- multivariate analysis, Exploring Two or More Variables
- multivariate normal distribution, Multivariate Normal Distribution
N
- n (sample size), Student’s t-Distribution
- n or sample size, Degrees of Freedom
- naive Bayes algorithm, Naive Bayes-Further Reading
- neighbors, K-Nearest Neighbors
- network data structures, Nonrectangular Data Structures
- nodes, Tree Models
- non-normal residuals, Testing the Assumptions: Regression Diagnostics
- nonlinear regression, Polynomial and Spline Regression-Further Reading
- nonrectangular data structures, Nonrectangular Data Structures
- normal distribution, Normal Distribution-Standard Normal and QQ-Plots
- normalization, Standard Normal and QQ-Plots, Standardization (Normalization, Z-Scores), K-Means Clustering
- null hypothesis, Hypothesis Tests, The Null Hypothesis
- numeric variables
- numerical data as categorical data, Exploring Binary and Categorical Data
O
- object representation (spatial data), Nonrectangular Data Structures
- Occam's razor, Model Selection and Stepwise Regression
- odds, Logistic Regression, Logistic Response Function and Logit
- odds ratios, Interpreting the Coefficients and Odds Ratios
- omnibus tests, ANOVA
- one hot encoder, Factor Variables in Regression, One Hot Encoder
- one hot encoding, Dummy Variables Representation
- one-way tests, Hypothesis Tests, One-Way, Two-Way Hypothesis Test
- order statistics, Estimates of Variability, Estimates Based on Percentiles
- ordered factor variables, Ordered Factor Variables
- ordinal data, Elements of Structured Data
- ordinary least squares (OLS), Least Squares, Heteroskedasticity, Non-Normality and Correlated Errors
- out-of-bag (OOB) estimate of error, Random Forest
- outcome, Rectangular Data
- outliers, Estimates of Location, Outliers, Testing the Assumptions: Regression Diagnostics
- overfitting, Multiple Testing
- oversampling, Strategies for Imbalanced Data, Oversampling and Up/Down Weighting
P
- p-values, Statistical Significance and P-Values, P-Value
- pairwise comparisons, ANOVA
- partial residual plots, Testing the Assumptions: Regression Diagnostics, Partial Residual Plots and Nonlinearity
- PCA (see principal components analysis)
- Pearson residuals, Chi-Square Test: A Resampling Approach
- Pearson's chi-square test, Chi-Square Test: Statistical Theory
- Pearson's correlation coefficient, Correlation
- Pearson, Karl, Chi-Square Test, Principal Components Analysis
- penalized regression, Model Selection and Stepwise Regression
- percentiles, Estimates of Variability
- permission, obtaining for human subject testing, Why Just A/B? Why Not C, D…?
- permutation tests, Resampling
- pertinent records (in searches), Size versus Quality: When Does Size Matter?
- physical networks, Nonrectangular Data Structures
- pie charts, Exploring Binary and Categorical Data
- pivot tables (Excel), Two Categorical Variables
- point estimates, Confidence Intervals
- Poisson distributions, Poisson and Related Distributions, Generalized Linear Models
- polynomial coding, Dummy Variables Representation
- polynomial regression, Polynomial and Spline Regression, Polynomial
- population, Random Sampling and Sample Bias
- posterior probability, Naive Bayes, The Naive Solution
- power and sample size, Power and Sample Size-Further Reading
- precision, Evaluating Classification Models
- predicted values, Fitted Values and Residuals
- prediction
- prediction intervals, Prediction Using Regression
- predictor variables, Data Frames and Indexes, The Regression Equation
- principal components, Principal Components Analysis
- principal components analysis, Principal Components Analysis-Further Reading
- probability theory, Exploratory Data Analysis
- profiling vs. explanation, Prediction versus Explanation (Profiling)
- propensity score, Classification
- proxy variables, Example: Web Stickiness
- pruning, Tree Models, Stopping the Tree from Growing
- pseudo-residuals, The Boosting Algorithm
R
- R-squared, Multiple Linear Regression, Assessing the Model
- random forests, Interactions and Main Effects, Tree Models, Random Forest-Hyperparameters
- random sampling, Random Sampling and Sample Bias-Further Reading
- random subset of variables, Random Forest
- randomization, A/B Testing
- randomization tests, Resampling
- (see also permutation tests)
- randomness, misinterpreting, Hypothesis Tests
- range, Estimates of Variability, Estimates Based on Percentiles
- rare class problem, The Rare Class Problem
- recall, Evaluating Classification Models, Precision, Recall, and Specificity
- receiver operating characteristics (see ROC curve)
- records, Rectangular Data, Simple Linear Regression
- rectangular data, Rectangular Data-Estimates of Location
- recursive partitioning, Tree Models, The Recursive Partitioning Algorithm, Random Forest
- reference coding, Factor Variables in Regression-Dummy Variables Representation, Interactions and Main Effects, Logistic Regression and the GLM
- regression, Regression and Prediction-Summary
- causation and, Prediction versus Explanation (Profiling)
- diagnostics, Testing the Assumptions: Regression Diagnostics-Polynomial and Spline Regression
- different meanings of the term, Least Squares
- factor variables in, Factor Variables in Regression-Ordered Factor Variables
- interpreting the regression equation, Interpreting the Regression Equation-Interactions and Main Effects
- KNN (K-Nearest Neighbors), KNN as a Feature Engine
- logistic regression, Logistic Regression-Further Reading
- multiple linear regression, Multiple Linear Regression-Weighted Regression
- polynomial and spline regression, Polynomial and Spline Regression-Summary
- prediction with, Prediction Using Regression-Factor Variables in Regression
- ridge regression, Regularization: Avoiding Overfitting
- simple linear regression, Simple Linear Regression-Further Reading
- unsupervised learning as building block, Unsupervised Learning
- with a tree, Predicting a Continuous Value
- regression coefficient, Simple Linear Regression
- regression to the mean, Regression to the Mean
- regularization, Boosting
- replacement (in sampling), Random Sampling and Sample Bias
- representativeness, Random Sampling and Sample Bias
- resampling, The Bootstrap, Resampling-For Further Reading
- residual standard error, Multiple Linear Regression, Assessing the Model
- residual sum of squares, Least Squares
- residuals, Simple Linear Regression, Fitted Values and Residuals
- response, Simple Linear Regression, The Regression Equation
- ridge regression, Model Selection and Stepwise Regression, Regularization: Avoiding Overfitting
- robust, Estimates of Location
- robust estimates of location
- ROC curve, ROC Curve
- root mean squared error (RMSE), Multiple Linear Regression, Assessing the Model, Predicting a Continuous Value
- RSE (see residual standard error)
- RSS (residual sum of squares), Least Squares
S
- sample bias, Random Sampling and Sample Bias, Random Sampling and Sample Bias
- sample statistic, Sampling Distribution of a Statistic
- samples
- sampling, Data and Sampling Distributions-Summary
- binomial distribution, Binomial Distribution-Further Reading
- bootstrap, The Bootstrap-Further Reading
- confidence intervals, Confidence Intervals-Further Reading
- long-tail distributions, Long-Tailed Distributions-Further Reading
- normal distribution, Normal Distribution-Standard Normal and QQ-Plots
- oversampling imbalanced data, Oversampling and Up/Down Weighting
- Poisson and related distributions, Poisson and Related Distributions-Summary
- population versus sample, Data and Sampling Distributions
- random sampling and sample bias, Random Sampling and Sample Bias-Further Reading
- sampling distribution of a statistic, Sampling Distribution of a Statistic-Further Reading
- selection bias, Selection Bias-Further Reading
- Student's t-distribution, Student’s t-Distribution-Further Reading
- Thompson's sampling, Multi-Arm Bandit Algorithm
- undersampling imbalanced data, Undersampling
- with and without replacement, Random Sampling and Sample Bias, The Bootstrap, Resampling
- sampling distribution, Sampling Distribution of a Statistic-Further Reading
- scale parameter (Weibull distribution), Weibull Distribution
- scaling and categorical variables, Scaling and Categorical Variables-Summary
- scatterplot smoothers, Heteroskedasticity, Non-Normality and Correlated Errors
- scatterplots, Correlation
- scientific fraud, detecting, Fisher’s Exact Test
- screeplots, Principal Components Analysis, Interpreting Principal Components
- searches
- selection bias, Selection Bias-Further Reading
- self-selection sampling bias, Random Sampling and Sample Bias
- sensitivity, Evaluating Classification Models, Precision, Recall, and Specificity
- shape parameter (Weibull distribution), Weibull Distribution
- signal-to-noise ratio, Choosing K
- significance level, Power and Sample Size, Sample Size
- significance tests, Hypothesis Tests, Data Science and P-Values
- (see also hypothesis tests)
- simple random sample, Random Sampling and Sample Bias
- single linkage, Measures of Dissimilarity
- skew, Long-Tailed Distributions
- skewness, Frequency Table and Histograms
- slope, Simple Linear Regression
- SMOTE algorithm, Data Generation
- spatial data structures, Nonrectangular Data Structures
- specificity, Evaluating Classification Models, Precision, Recall, and Specificity
- spline regression, Polynomial and Spline Regression, Splines
- splines, Splines
- split value, Tree Models
- square-root of n rule, Standard Error
- SS (sum of squares), ANOVA
- standard deviation, Estimates of Variability
- standard error, Sampling Distribution of a Statistic
- standard normal distribution, Normal Distribution, Standard Normal and QQ-Plots
- standardization, Standard Normal and QQ-Plots, K-Nearest Neighbors, K-Means Clustering
- standardized residuals, Testing the Assumptions: Regression Diagnostics
- statistical experiments and significance testing, Statistical Experiments and Significance Testing-Summary
- A/B testing, A/B Testing-For Further Reading
- chi-square test, Chi-Square Test-Further Reading
- degrees of freedom, Degrees of Freedom-Further Reading
- hypothesis tests, Hypothesis Tests-Further Reading
- multi-arm bandit algorithm, Multi-Arm Bandit Algorithm-Further Reading
- multiple tests, Multiple Testing-Further Reading
- power and sample size, Power and Sample Size-Further Reading
- resampling, Resampling-Statistical Significance and P-Values
- statistical significance and p-values, Statistical Significance and P-Values-Further Reading
- t-tests, t-Tests-Further Reading
- statistical inference, classical inference pipeline, Statistical Experiments and Significance Testing
- statistical machine learning, Statistical Machine Learning-Summary
- statistical moments, Frequency Table and Histograms
- statistical significance, Permutation Test
- statistics vs. machine learning, Statistical Machine Learning
- stepwise regression, Model Selection and Stepwise Regression
- stochastic gradient boosting, The Boosting Algorithm
- stratified sampling, Random Sampling and Sample Bias, Random Selection
- structured data, Elements of Structured Data-Further Reading
- Student's t-distribution, Student’s t-Distribution-Further Reading
- subjects, A/B Testing
- success, Binomial Distribution
- sum contrasts, Dummy Variables Representation
T
- t-distributions, Student’s t-Distribution-Further Reading, t-Tests
- t-statistic, t-Tests, Multiple Linear Regression, Assessing the Model
- t-tests, t-Tests-Further Reading
- tail, Long-Tailed Distributions
- target shuffling, Selection Bias
- test sample, Evaluating Classification Models
- test statistic, A/B Testing, t-Tests
- Thompson sampling, Multi-Arm Bandit Algorithm
- time series data, Nonrectangular Data Structures
- time-to-failure analysis, Weibull Distribution
- treatment, A/B Testing
- treatment group, A/B Testing
- tree models, Interactions and Main Effects, Exploring the Predictions, Tree Models
- Trellis graphics, Visualizing Multiple Variables
- trials, Binomial Distribution
- trimmed mean, Estimates of Location
- Tukey, John Wilder, Exploratory Data Analysis
- two-way tests, Hypothesis Tests, One-Way, Two-Way Hypothesis Test
- type 1 errors, Statistical Significance and P-Values, Type 1 and Type 2 Errors, Multiple Testing
- type 2 errors, Statistical Significance and P-Values, Type 1 and Type 2 Errors
W
- Ward's method, Measures of Dissimilarity
- web stickiness example (permutation test), Example: Web Stickiness
- web testing
- Weibull distribution, Poisson and Related Distributions
- weighted mean, Estimates of Location
- weighted median, Estimates of Location, Median and Robust Estimates
- formula for calculating, Mean
- weighted regression, Multiple Linear Regression, Weighted Regression
- weights, Simple Linear Regression
- whiskers (in boxplots), Percentiles and Boxplots
- wins, Multi-Arm Bandit Algorithm
- within cluster sum of squares (SS), K-Means Clustering
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.