Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix B. Appendix B. Machine Learning Quick Reference: Algorithms

Penalized Regression
Common Usage Supervised regression Supervised classification	Common Concerns Missing Values Outliers Standardization Parameter tuning
Suggested Scale Small to large data	Interpretability High
Suggested Usage Modeling linear or linearly separable phenomena Manually specifying nonlinear and explicit interaction terms Well suited for N << p
Naïve Bayes
Common Usage Supervised classification	Common Concerns Strong linear independence assumption Infrequent categorical levels
Suggested Scale Small to extremely large data sets	Interpretability Moderate
Suggested Usage Modeling linearly separable phenomena in large data sets Well-suited for extremely large data sets where complex methods are intractable
Decision Trees
Common Usage Supervised regression Supervised classification	Common Concerns Instability with small training data sets Gradient boosting can be unstable with noise or outliers Overfitting Parameter tuning
Suggested Scale Medium to large data sets	Interpretability Moderate
Suggested Usage Modeling nonlinear and nonlinearly separable phenomena in large, dirty data Interactions considered automatically, but implicitly Missing values and outliers in input variables handled automatically in many implementations Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction accuracy and decrease overfitting, but also decrease scalability and interpretability
k-Nearest Neighbors (kNN)
Common Usage Supervised regression Supervised classification	Common Concerns Missing values Overfitting Outliers Standardization Curse of dimensionality
Suggested Scale Small to medium data sets	Interpretability Low
Suggested Usage Modeling nonlinearly separable phenomena Can be used to match the accuracy of more sophisticated techniques, but with fewer tuning parameters
Support Vector Machines (SVM)
Common Usage Supervised regression Supervised classification Anomaly detection	Common Concerns Missing values Overfitting Outliers Standardization Parameter tuning Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate
Suggested Scale Small to large data sets for linear kernels Small to medium data sets for nonlinear kernels	Interpretability Low
Suggested Usage Modeling linear or linearly separable phenomena by using linear kernels Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels Anomaly detection with one-class SVM (OSVM)
Artificial Neural Networks (ANN)
Common Usage Supervised regression Supervised classification Unsupervised clustering Unsupervised feature extraction Anomaly detection	Common Concerns Missing values Overfitting Outliers Standardization Parameter tuning Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate
Suggested Scale Small to large data sets for linear kernels Small to medium data sets for nonlinear kernels	Interpretability Low
Suggested Usage Modeling linear or linearly separable phenomena by using linear kernels Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels Anomaly detection with one-class SVM (OSVM)
Association Rules
Common Usage Supervised rule building Unsupervised rule building	Common Concerns Instability with small training data Overfitting Parameter tuning
Suggested Scale Medium to large transactional data sets	Interpretability Moderate
Suggested Usage Building sets of complex rules by using the co-occurrence of items or events in transactional data sets
k-Means
Common Usage Unsupervised clustering	Common Concerns Missing values Outliers Standardization Correct number of clusters is often unknown Highly sensitive to initialization Curse of dimensionality
Suggested Scale Small to large data sets	Interpretability Moderate
Suggested Usage Creating a known a priori number of spherical, disjoint, equally sized clusters k-modes method can be used for categorical data k-prototypes method can be used for mixed data
Hierarchical Clustering
Common Usage Unsupervised clustering	Common Concerns Missing values Standardization Correct number of clusters is often unknown Curse of dimensionality
Suggested Scale Small data sets	Interpretability Moderate
Suggested Usage Creating a known a priori number of nonspherical, disjoint, or overlapping clusters of different sizes
Spectral Clustering
Common Usage Unsupervised clustering	Common Concerns Missing values Standardization Parametertuning Curse of dimensionality
Suggested Scale Small data sets	Interpretability Moderate
Suggested Usage Creating a data-dependent number of arbitrarily shaped, disjoint, or overlapping clusters of different sizes
Principal Components Analysis (PCA)
Common Usage Unsupervised feature extraction	Common Concerns Missing values Outliers
Suggested Scale Small to large data sets for traditional PCA and SVD Small to medium data sets for sparse PCA and kernel PCA	Interpretability Generally low, but higher sparse PCA or rotated solutions
Suggested Usage Extracting a data-dependent number of linear, orthogonal features, where N >> p Extracted features can be rotated to increase interpretability, but orthogonality is usually lost Singular value decomposition (SVD) is often used instead of PCA on wide or sparse data Sparse PCA can be used to create more interpretable features, but orthogonality is lost Kernel PCA can be used to extract nonlinear features
Nonnegative Matrix Factorization (NMF)
Common Usage Unsupervised feature extraction	Common Concerns Missing values Outliers Standardization Correct number of features is often unknown Presence of negative values
Suggested Scale Small to large data sets	Interpretability High
Suggested Usage Extracting a known a priori number of interpretable, linear, oblique, nonnegative features
Random Projections
Common Usage Unsupervised feature extraction	Common Concerns Missing values
Suggested Scale Medium to extremely large data sets	Interpretability Low
Suggested Usage Extracting a data-dependent number of linear, uninterpretable, randomly oriented features of equal importance
Factorization Machines
Common Usage Supervised regression and classification Unsupervised feature extraction	Common Concerns Missing values Outliers Standardization Correct number of features is often unknown Less suited for dense data
Suggested Scale Medium to extremely large sparse or transactional data sets	Interpretability Moderate
Suggested Usage Extracting a known a priori number of uninterpretable, oblique features from sparse or transactional data sets Can automatically account for variable interactions Creating models from a large number of sparse features; can outperform SVM for sparse data

Penalized Regression

Common Usage

Supervised regression
Supervised classification

Common Concerns

Missing Values
Outliers
Standardization
Parameter tuning

Suggested Scale

Small to large data

Interpretability

High

Suggested Usage

Modeling linear or linearly separable phenomena
Manually specifying nonlinear and explicit interaction terms
Well suited for N << p

Naïve Bayes

Common Usage

Supervised classification

Common Concerns

Strong linear independence assumption
Infrequent categorical levels

Suggested Scale

Small to extremely large data sets

Interpretability

Moderate

Suggested Usage

Modeling linearly separable phenomena in large data sets
Well-suited for extremely large data sets where complex methods are intractable

Decision Trees

Common Usage

Supervised regression
Supervised classification

Common Concerns

Instability with small training data sets
Gradient boosting can be unstable with noise or outliers
Overfitting
Parameter tuning

Suggested Scale

Medium to large data sets

Interpretability

Moderate

Suggested Usage

Modeling nonlinear and nonlinearly separable phenomena in large, dirty data
Interactions considered automatically, but implicitly
Missing values and outliers in input variables handled automatically in many implementations
Decision tree ensembles, e.g., random forests and gradient boosting, can increase prediction accuracy and decrease overfitting, but also decrease scalability and interpretability

k-Nearest Neighbors (kNN)

Common Usage

Supervised regression
Supervised classification

Common Concerns

Missing values
Overfitting
Outliers
Standardization
Curse of dimensionality

Suggested Scale

Small to medium data sets

Interpretability

Suggested Usage

Modeling nonlinearly separable phenomena
Can be used to match the accuracy of more sophisticated techniques, but with fewer tuning parameters

Support Vector Machines (SVM)

Common Usage

Supervised regression
Supervised classification
Anomaly detection

Common Concerns

Missing values
Overfitting
Outliers
Standardization
Parameter tuning
Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate

Suggested Scale

Small to large data sets for linear kernels
Small to medium data sets for nonlinear kernels

Interpretability

Suggested Usage

Modeling linear or linearly separable phenomena by using linear kernels
Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
Anomaly detection with one-class SVM (OSVM)

Artificial Neural Networks (ANN)

Common Usage

Supervised regression
Supervised classification
Unsupervised clustering
Unsupervised feature extraction
Anomaly detection

Common Concerns

Missing values
Overfitting
Outliers
Standardization
Parameter tuning
Accuracy versus deep neural networks depends on choice of nonlinear kernel; Gaussian and polynomial often less accurate

Suggested Scale

Small to large data sets for linear kernels
Small to medium data sets for nonlinear kernels

Interpretability

Suggested Usage

Modeling linear or linearly separable phenomena by using linear kernels
Modeling nonlinear or nonlinearly separable phenomena by using nonlinear kernels
Anomaly detection with one-class SVM (OSVM)

Association Rules

Common Usage

Supervised rule building
Unsupervised rule building

Common Concerns

Instability with small training data
Overfitting
Parameter tuning

Suggested Scale

Medium to large transactional data sets

Interpretability

Moderate

Suggested Usage

Building sets of complex rules by using the co-occurrence of items or events in transactional data sets

k-Means

Common Usage

Unsupervised clustering

Common Concerns

Missing values
Outliers
Standardization
Correct number of clusters is often unknown
Highly sensitive to initialization
Curse of dimensionality

Suggested Scale

Small to large data sets

Interpretability

Moderate

Suggested Usage

Creating a known a priori number of spherical, disjoint, equally sized clusters
k-modes method can be used for categorical data
k-prototypes method can be used for mixed data

Hierarchical Clustering

Common Usage

Unsupervised clustering

Common Concerns

Missing values
Standardization
Correct number of clusters is often unknown
Curse of dimensionality

Suggested Scale

Small data sets

Interpretability

Moderate

Suggested Usage

Creating a known a priori number of nonspherical, disjoint, or overlapping clusters of different sizes

Spectral Clustering

Common Usage

Unsupervised clustering

Common Concerns

Missing values
Standardization
Parametertuning
Curse of dimensionality

Suggested Scale

Small data sets

Interpretability

Moderate

Suggested Usage

Creating a data-dependent number of arbitrarily shaped, disjoint, or overlapping clusters of different sizes

Principal Components Analysis (PCA)

Common Usage

Unsupervised feature extraction

Common Concerns

Missing values
Outliers

Suggested Scale

Small to large data sets for traditional PCA and SVD
Small to medium data sets for sparse PCA and kernel PCA

Interpretability

Generally low, but higher sparse PCA or rotated solutions

Suggested Usage

Extracting a data-dependent number of linear, orthogonal features, where N >> p
Extracted features can be rotated to increase interpretability, but orthogonality is usually lost
Singular value decomposition (SVD) is often used instead of PCA on wide or sparse data
Sparse PCA can be used to create more interpretable features, but orthogonality is lost
Kernel PCA can be used to extract nonlinear features

Nonnegative Matrix Factorization (NMF)

Common Usage

Unsupervised feature extraction

Common Concerns

Missing values
Outliers
Standardization
Correct number of features is often unknown
Presence of negative values

Suggested Scale

Small to large data sets

Interpretability

High

Suggested Usage

Extracting a known a priori number of interpretable, linear, oblique, nonnegative features

Random Projections

Common Usage

Unsupervised feature extraction

Common Concerns

Missing values

Suggested Scale

Medium to extremely large data sets

Interpretability

Suggested Usage

Extracting a data-dependent number of linear, uninterpretable, randomly oriented features of equal importance

Factorization Machines

Common Usage

Supervised regression and classification
Unsupervised feature extraction

Common Concerns

Missing values
Outliers
Standardization
Correct number of features is often unknown
Less suited for dense data

Suggested Scale

Medium to extremely large sparse or transactional data sets

Interpretability

Moderate

Suggested Usage

Extracting a known a priori number of uninterpretable, oblique features from sparse or transactional data sets
Can automatically account for variable interactions
Creating models from a large number of sparse features; can outperform SVM for sparse data

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for B. Appendix B. Machine Learning Quick Reference: Algorithms

Create new playlist

Sign In

Sign Up

Appendix B. Appendix B. Machine Learning Quick Reference: Algorithms

Table of Contents for
B. Appendix B. Machine Learning Quick Reference: Algorithms