Removing anomalies from the data

For supervised datasets, manual inspection works fine for datasets with fewer features. As the feature count goes high, manual inspection becomes impractical. We need to perform feature selection techniques, such as chi-square test, random forest, and so on, to deal with the volume of features. We can also use an autoencoder to narrow down the relevant features. Remember that each feature should have a fair contribution toward the prediction outcomes. So, we need to remove noise features from the raw dataset and keep everything else as is, including any uncertain features. In this recipe, we will walk through the steps to identify anomalies in the data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset