Model evaluation

A model is basically a generalized representation of data and the underlying algorithm used for learning this representation. Thus, model evaluation is the process of evaluating the built model against certain criteria to assess its performance. Model performance is usually a function defined to provide a numerical value to help us decide the effectiveness of any model. Often, cost or loss functions are optimized to build an accurate model based on these evaluation metrics.

Depending upon the modeling technique used, we leverage relevant evaluation metrics. For supervised methods, we usually leverage the following techniques:

Creating a confusion matrix based on model predictions versus actual values. This covers metrics such as True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) considering one of the classes as the positive class (which is usually a class of interest).
Metrics derived from the confusion matrix, which include accuracy (overall performance), precision (predictive power of the model), recall (hit rate), and the F1-score (harmonic mean of precision and recall).
The receiver operator characteristic (ROC) curve and the area under curve (AUC) metric, which represents the AUC.
R-square (coefficient of determination), root mean square error (RMSE), F-statistic, Akaike information criterion (AIC), and p-values specifically for regression models.

Popular metrics for evaluating unsupervised methods such as clustering include the following:

Silhouette coefficients
Sum of squared errors
Homogeneity, completeness, and the V-measure
Calinski-Harabaz index

Do note that this list depicts the most popular metrics, which are extensively used, but is by no means an exhaustive list of model evaluation metrics.

Cross-validation is also an important aspect of the model evaluation process where we leverage validation sets based on cross-validation strategies to evaluate model performance by tuning various hyperparameters of the model. You can think of hyperparameters as knobs that can be used to tune the model to build efficient and better performing models. The usage and details of these evaluation techniques will be more clear when we use them for evaluating our models in subsequent chapters with extensive hands-on examples.

Table of Contents for Model evaluation

Create new playlist

Sign In

Sign Up

Table of Contents for
Model evaluation