Appendix B. Appendix B. Machine Learning Quick Reference: Algorithms

Penalized Regression

Common Usage

  • Supervised regression
  • Supervised classification

Common Concerns

  • Missing Values
  • Outliers
  • Standardization
  • Parameter tuning

Suggested Scale

  • Small to large data

Interpretability

  • High

Suggested Usage

  • Modeling linear or linearly separable phenomena
  • Manually specifying nonlinear and explicit interaction terms
  • Well suited for N << p

Naïve Bayes

Common Usage

  • Supervised classification

Common Concerns

  • Strong linear independence assumption
  • Infrequent categorical levels

Suggested Scale

  • Small to extremely large data sets

Interpretability

  • Moderate

Suggested Usage

  • Modeling linearly separable phenomena in large data sets
  • Well-suited for extremely large data sets where complex methods are intractable

Decision Trees

Common Usage

  • Supervised regression
  • Supervised classification

Common Concerns

  • Instability with small training data sets
  • Gradient boosting can be unstable with noise or outliers
  • Overfitting
  • Parameter tuning

Suggested Scale

  • Medium to large data sets

Interpretability

  • Moderate

Suggested Usage

  • Modeling nonlinear and nonlinearly separable phenomena in large, dirty data
  • Interactions considered automatically, but implicitly
  • Missing values and outliers in input variables handled automatically in many implementations
  • Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction accuracy and decrease overfitting, but also decrease scalability and interpretability

k-Nearest Neighbors (kNN)

Common Usage

  • Supervised regression
  • Supervised classification

Common Concerns

  • Missing values
  • Overfitting
  • Outliers
  • Standardization
  • Curse of dimensionality

Suggested Scale

  • Small to medium data sets

Interpretability

  • Low

Suggested Usage

  • Modeling nonlinearly separable phenomena
  • Can be used to match the accuracy of more sophisticated techniques, but with fewer tuning parameters

Support Vector Machines (SVM)

Common Usage

  • Supervised regression
  • Supervised classification
  • Anomaly detection

Common Concerns

  • Missing values
  • Overfitting
  • Outliers
  • Standardization
  • Parameter tuning
  • Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate

Suggested Scale

  • Small to large data sets for linear kernels
  • Small to medium data sets for nonlinear kernels

Interpretability

  • Low

Suggested Usage

  • Modeling linear or linearly separable phenomena by using linear kernels
  • Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
  • Anomaly detection with one-class SVM (OSVM)

Artificial Neural Networks (ANN)

Common Usage

  • Supervised regression
  • Supervised classification
  • Unsupervised clustering
  • Unsupervised feature extraction
  • Anomaly detection

Common Concerns

  • Missing values
  • Overfitting
  • Outliers
  • Standardization
  • Parameter tuning
  • Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate

Suggested Scale

  • Small to large data sets for linear kernels
  • Small to medium data sets for nonlinear kernels

Interpretability

  • Low

Suggested Usage

  • Modeling linear or linearly separable phenomena by using linear kernels
  • Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
  • Anomaly detection with one-class SVM (OSVM)

Association Rules

Common Usage

  • Supervised rule building
  • Unsupervised rule building

Common Concerns

  • Instability with small training data
  • Overfitting
  • Parameter tuning

Suggested Scale

  • Medium to large transactional data sets

Interpretability

  • Moderate

Suggested Usage

  • Building sets of complex rules by using the co-occurrence of items or events in transactional data sets

k-Means

Common Usage

  • Unsupervised clustering

Common Concerns

  • Missing values
  • Outliers
  • Standardization
  • Correct number of clusters is often unknown
  • Highly sensitive to initialization
  • Curse of dimensionality

Suggested Scale

  • Small to large data sets

Interpretability

  • Moderate

Suggested Usage

  • Creating a known a priori number of spherical, disjoint, equally sized clusters
  • k-modes method can be used for categorical data
  • k-prototypes method can be used for mixed data

Hierarchical Clustering

Common Usage

  • Unsupervised clustering

Common Concerns

  • Missing values
  • Standardization
  • Correct number of clusters is often unknown
  • Curse of dimensionality

Suggested Scale

  • Small data sets

Interpretability

  • Moderate

Suggested Usage

  • Creating a known a priori number of nonspherical, disjoint, or overlapping clusters of different sizes

Spectral Clustering

Common Usage

  • Unsupervised clustering

Common Concerns

  • Missing values
  • Standardization
  • Parametertuning
  • Curse of dimensionality

Suggested Scale

  • Small data sets

Interpretability

  • Moderate

Suggested Usage

  • Creating a data-dependent number of arbitrarily shaped, disjoint, or overlapping clusters of different sizes

Principal Components Analysis (PCA)

Common Usage

  • Unsupervised feature extraction

Common Concerns

  • Missing values
  • Outliers

Suggested Scale

  • Small to large data sets for traditional PCA and SVD
  • Small to medium data sets for sparse PCA and kernel PCA

Interpretability

  • Generally low, but higher sparse PCA or rotated solutions

Suggested Usage

  • Extracting a data-dependent number of linear, orthogonal features, where N >> p
  • Extracted features can be rotated to increase interpretability, but orthogonality is usually lost
  • Singular value decomposition (SVD) is often used instead of PCA on wide or sparse data
  • Sparse PCA can be used to create more interpretable features, but orthogonality is lost
  • Kernel PCA can be used to extract nonlinear features

Nonnegative Matrix Factorization (NMF)

Common Usage

  • Unsupervised feature extraction

Common Concerns

  • Missing values
  • Outliers
  • Standardization
  • Correct number of features is often unknown
  • Presence of negative values

Suggested Scale

  • Small to large data sets

Interpretability

  • High

Suggested Usage

  • Extracting a known a priori number of interpretable, linear, oblique, nonnegative features

Random Projections

Common Usage

  • Unsupervised feature extraction

Common Concerns

  • Missing values

Suggested Scale

  • Medium to extremely large data sets

Interpretability

  • Low

Suggested Usage

  • Extracting a data-dependent number of linear, uninterpretable, randomly oriented features of equal importance

Factorization Machines

Common Usage

  • Supervised regression and classification
  • Unsupervised feature extraction

Common Concerns

  • Missing values
  • Outliers
  • Standardization
  • Correct number of features is often unknown
  • Less suited for dense data

Suggested Scale

  • Medium to extremely large sparse or transactional data sets

Interpretability

  • Moderate

Suggested Usage

  • Extracting a known a priori number of uninterpretable, oblique features from sparse or transactional data sets
  • Can automatically account for variable interactions
  • Creating models from a large number of sparse features; can outperform SVM for sparse data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset