There are multiple ways to perform normalization:
- Min-max standardization: The min-max retains the original distribution and scales the feature values between [0, 1], with 0 as the minimum value of the feature and 1 as the maximum value. The standardization is performed as follows:
Here, x' is the normalized value of the feature. The method is sensitive to outliers in the dataset.
- Decimal scaling: This form of scaling is used where values of different decimal ranges are present. For example, two features with different bounds can be brought to a similar scale using decimal scaling as follows:
x'=x/10n
- Z-score: This transformation scales the value toward a normal distribution with a zero mean and unit variance. The Z-score is computed as:
Z=(x-µ)/σ
Here, µ is the mean and σ is the standard deviation of the feature. These distributions are very efficient for a dataset with a Gaussian distribution.
All the preceding methods are sensitive to outliers; there are other more robust approaches for normalization that you can explore, such as Median Absolute Deviation (MAD), tanh-estimator, and double sigmoid.