Summary

Feature engineering, feature selection, and feature construction are the three most commonly used steps while preparing the training and test set towards building a machine learning model. Usually, the feature engineering is applied first to generate additional features from the available dataset. After that, the feature selection technique is applied to eliminate irrelevant, missing or null, redundant, or even highly correlated features so that high predictive accuracy can be availed.

In contrast, feature construction is an advanced technique applied to construct new features that are either absent or trivial in the raw dataset.

Note that it is not always necessary to perform feature engineering or feature selection. Whether to perform feature selection and construction depends on the data you have or collected, what kind of ML algorithm you have picked, and the objective of the experiment itself.

In this chapter, we have described all of the three steps in detail with practical Spark examples. In the next chapter, we will describe in detail some practical examples of supervised and unsupervised learning using two machine learning APIs: Spark MLlib and Spark ML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset