Normalizing numeric variables

For developing deep network models, we carry out the normalization of numeric variables to bring them to a common scale. When dealing with several variables, it is likely that different variables have different scales—for example, there could be a variable that shows revenues earned by a company and the values could be in millions of dollars. In another example, there could be a variable that shows the dimension of a product in centimeters. Such extreme differences in scale create difficulties when training a network, and normalization helps to address this issue. For normalization, we will use the following code:

# Normalize data 
data <- as.matrix(data)
dimnames(data) <- NULL
data[,1:21] <- normalize(data[,1:21])
data[,22] <- as.numeric(data[,22]) -1

As you can see from the preceding code, we first change the data to matrix format, and then we remove the default names by assigning NULL to the dimension names. In this step, the names of 22 variables will be changed to V1, V2, V3,..., V22. If you run str(data) at this stage, you will notice the change in format of the original data. We normalize the 21 independent variables using the normalize function, which is a part of the Keras package. When you run this line of code, you will notice that it uses TensorFlow as a backend. We also change the target variable, NSP, to numeric from the default integer type. In addition, in the same line of code, we also change values from 1, 2, and 3 to 0, 1, and 2 respectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset