Standardization is the process of converting the input so that it has a mean of 0
and standard deviation of 1.
If you are given a vector X, the mean of 0
and standard deviation of 1 for X can be achieved by the following equation:
Let's see how this can be achieved in Python.
Let's import the necessary libraries to begin with. We will follow this with the generation of the input data:
# Load Libraries import numpy as np from sklearn.preprocessing import scale # Input data generation np.random.seed(10) x = [np.random.randint(10,25)*1.0 for i in range(10)]
We are now ready to demonstrate standardization:
x_centered = scale(x,with_mean=True,with_std=False) x_standard = scale(x,with_mean=True,with_std=True) print x print x_centered print x_standard print "Orginal x mean = %0.2f, Centered x mean = %0.2f, Std dev of standard x =%0.2f"%(np.mean(x),np.mean(x_centered),np.std(x_standard))
We will generate some random data using np.random:
x = [np.random.randint(10,25)*1.0 for i in range(10)]
We will perform standardization using the scale
function from scikit-learn:
x_centered = scale(x,with_mean=True,with_std=False) x_standard = scale(x,with_mean=True,with_std=True)
The x_centered
is scaled using only the mean; you can see the with_mean
parameter set to True
and with_std
set to False
.
The x_standard
is standardized using both mean and standard deviation.
Now let us look at the output.
The original data is as follows:
[19.0, 23.0, 14.0, 10.0, 11.0, 21.0, 22.0, 19.0, 23.0, 10.0] Next, we will print x_centered, where we centered it with the mean value: [ 1.8 5.8 -3.2 -7.2 -6.2 3.8 4.8 1.8 5.8 -7.2] Finally we will print x_standardized, where we used both the mean and standard deviation: [ 0.35059022 1.12967961 -0.62327151 -1.4023609 -1.20758855 0.74013492 0.93490726 0.35059022 1.12967961 -1.4023609 ] Orginal x mean = 17.20, Centered x mean = 0.00, Std dev of standard x =1.00
Let's break the preceding equation in two parts: just the numerator part, which is called centering, and the whole equation, which is called standardization. Using the mean values, centering plays a critical role in regression. Consider a dataset that has two attributes, weight and height. We will center the data such that the predictor, weight, has a mean of 0
. This makes the interpretation of intercept easier. The intercept will be interpreted as what is the expected height when the predictor values are set to their mean.