Penalized Regression
|
Common Usage
- Supervised regression
- Supervised classification
|
Common Concerns
- Missing Values
- Outliers
- Standardization
- Parameter tuning
|
Suggested Scale
|
Interpretability
|
Suggested Usage
- Modeling linear or linearly separable phenomena
- Manually specifying nonlinear and explicit interaction terms
- Well suited for N << p
|
Naïve Bayes
|
Common Usage
- Supervised classification
|
Common Concerns
- Strong linear independence assumption
- Infrequent categorical levels
|
Suggested Scale
- Small to extremely large data sets
|
Interpretability
|
Suggested Usage
- Modeling linearly separable phenomena in large data sets
- Well-suited for extremely large data sets where complex methods are intractable
|
Decision Trees
|
Common Usage
- Supervised regression
- Supervised classification
|
Common Concerns
- Instability with small training data sets
- Gradient boosting can be unstable with noise or outliers
- Overfitting
- Parameter tuning
|
Suggested Scale
- Medium to large data sets
|
Interpretability
|
Suggested Usage
- Modeling nonlinear and nonlinearly separable phenomena in large, dirty data
- Interactions considered automatically, but implicitly
- Missing values and outliers in input variables handled automatically in many implementations
- Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction accuracy and decrease overfitting, but also decrease scalability and interpretability
|
k-Nearest Neighbors (kNN)
|
Common Usage
- Supervised regression
- Supervised classification
|
Common Concerns
- Missing values
- Overfitting
- Outliers
- Standardization
- Curse of dimensionality
|
Suggested Scale
- Small to medium data sets
|
Interpretability
|
Suggested Usage
- Modeling nonlinearly separable phenomena
- Can be used to match the accuracy of more sophisticated techniques, but with fewer tuning parameters
|
Support Vector Machines (SVM)
|
Common Usage
- Supervised regression
- Supervised classification
- Anomaly detection
|
Common Concerns
- Missing values
- Overfitting
- Outliers
- Standardization
- Parameter tuning
- Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate
|
Suggested Scale
- Small to large data sets for linear kernels
- Small to medium data sets for nonlinear kernels
|
Interpretability
|
Suggested Usage
- Modeling linear or linearly separable phenomena by using linear kernels
- Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
- Anomaly detection with one-class SVM (OSVM)
|
Artificial Neural Networks (ANN)
|
Common Usage
- Supervised regression
- Supervised classification
- Unsupervised clustering
- Unsupervised feature extraction
- Anomaly detection
|
Common Concerns
- Missing values
- Overfitting
- Outliers
- Standardization
- Parameter tuning
- Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate
|
Suggested Scale
- Small to large data sets for linear kernels
- Small to medium data sets for nonlinear kernels
|
Interpretability
|
Suggested Usage
- Modeling linear or linearly separable phenomena by using linear kernels
- Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
- Anomaly detection with one-class SVM (OSVM)
|
Association Rules
|
Common Usage
- Supervised rule building
- Unsupervised rule building
|
Common Concerns
- Instability with small training data
- Overfitting
- Parameter tuning
|
Suggested Scale
- Medium to large transactional data sets
|
Interpretability
|
Suggested Usage
- Building sets of complex rules by using the co-occurrence of items or events in transactional data sets
|
k-Means
|
Common Usage
|
Common Concerns
- Missing values
- Outliers
- Standardization
- Correct number of clusters is often unknown
- Highly sensitive to initialization
- Curse of dimensionality
|
Suggested Scale
|
Interpretability
|
Suggested Usage
- Creating a known a priori number of spherical, disjoint, equally sized clusters
- k-modes method can be used for categorical data
- k-prototypes method can be used for mixed data
|
Hierarchical Clustering
|
Common Usage
|
Common Concerns
- Missing values
- Standardization
- Correct number of clusters is often unknown
- Curse of dimensionality
|
Suggested Scale
|
Interpretability
|
Suggested Usage
- Creating a known a priori number of nonspherical, disjoint, or overlapping clusters of different sizes
|
Spectral Clustering
|
Common Usage
|
Common Concerns
- Missing values
- Standardization
- Parametertuning
- Curse of dimensionality
|
Suggested Scale
|
Interpretability
|
Suggested Usage
- Creating a data-dependent number of arbitrarily shaped, disjoint, or overlapping clusters of different sizes
|
Principal Components Analysis (PCA)
|
Common Usage
- Unsupervised feature extraction
|
Common Concerns
|
Suggested Scale
- Small to large data sets for traditional PCA and SVD
- Small to medium data sets for sparse PCA and kernel PCA
|
Interpretability
- Generally low, but higher sparse PCA or rotated solutions
|
Suggested Usage
- Extracting a data-dependent number of linear, orthogonal features, where N >> p
- Extracted features can be rotated to increase interpretability, but orthogonality is usually lost
- Singular value decomposition (SVD) is often used instead of PCA on wide or sparse data
- Sparse PCA can be used to create more interpretable features, but orthogonality is lost
- Kernel PCA can be used to extract nonlinear features
|
Nonnegative Matrix Factorization (NMF)
|
Common Usage
- Unsupervised feature extraction
|
Common Concerns
- Missing values
- Outliers
- Standardization
- Correct number of features is often unknown
- Presence of negative values
|
Suggested Scale
|
Interpretability
|
Suggested Usage
- Extracting a known a priori number of interpretable, linear, oblique, nonnegative features
|
Random Projections
|
Common Usage
- Unsupervised feature extraction
|
Common Concerns
|
Suggested Scale
- Medium to extremely large data sets
|
Interpretability
|
Suggested Usage
- Extracting a data-dependent number of linear, uninterpretable, randomly oriented features of equal importance
|
Factorization Machines
|
Common Usage
- Supervised regression and classification
- Unsupervised feature extraction
|
Common Concerns
- Missing values
- Outliers
- Standardization
- Correct number of features is often unknown
- Less suited for dense data
|
Suggested Scale
- Medium to extremely large sparse or transactional data sets
|
Interpretability
|
Suggested Usage
- Extracting a known a priori number of uninterpretable, oblique features from sparse or transactional data sets
- Can automatically account for variable interactions
- Creating models from a large number of sparse features; can outperform SVM for sparse data
|