Model building and evaluation

Once the data is ready and prior to the creation of the model, we need to pick and select the features from the list of features available. This can be accomplished through several off-the-shelf feature selection techniques. Some ML algorithms (for example, XGBoost) have feature selection inbuilt within the algorithm, therefore we need not explicitly perform feature selection prior to carrying out the modeling activity.

A suite of ML algorithms is available to try and create models on the data. Additionally, models may be created through ensembling techniques as well. One needs to pick the algorithm(s) and create models using training datasets, then tune the hyperparameters of the model using validation datasets. Finally, the model that is created can be tested using the test dataset. Issues, such as selecting the right metric to evaluate model performance, overfitting, underfitting, and acceptable performance thresholds all need to be taken care of in the model-building step itself. It may be noted that if we do not obtain acceptable performance on the model, it is required to go back to previous steps in order to get more data or to create additional features and then repeat the model-building step once again to check if the model performance improves. This may be done as many times as required until the desired level of performance is achieved by the model.

At the end of the modeling step, we might end up with a suite of models each having its own performance measurement on the unseen test data. The model that has the best performance can be selected for use in production.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset