Deep auto encoders applied on handwritten digits using Keras

Deep auto encoders are explained with same handwritten digits data to show the comparison of how this non-linear method differs to linear methods like PCA and SVD. Non-linear methods generally perform much better, but these methods are kind of black-box models and we cannot determine the explanation behind that. Keras software has been utilized to build the deep auto encoders here, as they work like Lego blocks, which makes it easy for users to play around with different architectures and parameters of the model for better understanding:

# Deep Auto Encoders 
>>> import matplotlib.pyplot as plt 
>>> from sklearn.preprocessing import StandardScaler 
>>> from sklearn.datasets import load_digits 
 
>>> digits = load_digits() 
>>> X = digits.data 
>>> y = digits.target 
 
>>> print (X.shape) 
>>> print (y.shape) 
>>> x_vars_stdscle = StandardScaler().fit_transform(X) 
>>> print (x_vars_stdscle.shape) 

Dense neuron modules from Keras used for constructing encoder-decoder architecture:

>>> from keras.layers import Input,Dense 
>>> from keras.models import Model

GPU of NVIDIA GTX 1060 has been used here, also cuDNN and CNMeM libraries are installed for further enhancement of speed up to 4x-5x on the top of regular GPU performance. These libraries utilize 20 percent of GPU memory, which left the 80 percent of memory for working on the data. The user needs to be careful, if they have low memory GPUs like 3 GB to 4 GB, they may not be able to utilize these libraries.

The reader needs to consider one important point that, syntax of Keras code, will remain same in both CPU and GPU mode.

The following few lines of codes are the heart of the model. Input data have 64 columns. We need to take those columns into the input of the layers, hence we have given the shape of 64. Also, names have been assigned to each layer of the neural network, which we will explain the reason for in an upcoming section of the code. In the first hidden layer, 32 dense neurons are utilized, which means all the 64 inputs from the input layer are connected to 32 neurons in first hidden layer. The entire flow of dimensions are like 64, 32, 16, 2, 16, 32, 64. We have compressed input to two neurons, in order to plot the components on a 2D plot, whereas, if we need to plot a 3D data (which we will be covering in the next section), we need to change the hidden three-layer number to three instead of two. After training is complete, we need to use encoder section and predict the output.

# 2-Dimensional Architecture 
 
>>> input_layer = Input(shape=(64,),name="input") 
 
>>> encoded = Dense(32, activation='relu',name="h1encode")(input_layer) 
>>> encoded = Dense(16, activation='relu',name="h2encode")(encoded) 
>>> encoded = Dense(2, activation='relu',name="h3latent_layer")(encoded) 
 
>>> decoded = Dense(16, activation='relu',name="h4decode")(encoded) 
>>> decoded = Dense(32, activation='relu',name="h5decode")(decoded) 
>>> decoded = Dense(64, activation='sigmoid',name="h6decode")(decoded) 

To train the model, we need to pass the starting and ending point of the architecture. In the following code, we have provided input as input_layer and output as decoded, which is the last layer (the name is h6decode):

>>> autoencoder = Model(input_layer, decoded) 

Adam optimization has been used to optimize the mean square error, as we wanted to reproduce the original input at the end of the output layer of the network:

>>> autoencoder.compile(optimizer="adam", loss="mse") 

The network is trained with 100 epochs and a batch size of 256 observations per each batch. Validation split of 20 percent is used to check the accuracy on randomly selected validation data in order to ensure robustness, as if we just train only on the train data may create the overfitting problem, which is very common with highly non-linear models:

# Fitting Encoder-Decoder model 
>>> autoencoder.fit(x_vars_stdscle, x_vars_stdscle, epochs=100,batch_size=256, shuffle=True,validation_split= 0.2 ) 

From the previous results, we can see that the model has been trained on 1,437 train examples and validation on 360 examples. By looking into the loss value, both train and validation losses have decreased from 1.2314 to 0.9361 and 1.0451 to 0.7326 respectively. Hence, we are moving in the right direction. However, readers are encouraged to try various architectures and number of iterations, batch sizes, and so on to see how much the accuracies can be further improved.

Once the encoder-decoder section has been trained, we need to take only the encoder section to compress the input features in order to obtain the compressed latent features, which is the core idea of dimensionality reduction altogether! In the following code, we have constructed another model with a trained input layer and a middle hidden layer (h3latent_layer). This is the reason behind assigning names for each layer of the network.

# Extracting Encoder section of the Model for prediction of latent variables 
>>> encoder = Model(autoencoder.input,autoencoder.get_layer("h3latent_layer").output) 
 
Extracted encoder section of the whole model used for prediction of input variables to generate sparse 2-dimensional representation, which is being performed with the following code 
  
# Predicting latent variables with extracted Encoder model 
>>> reduced_X = encoder.predict(x_vars_stdscle)  

Just to check the dimensions of the reduced input variables and we can see that for all observations, we can see two dimensions or two column vector:

 >>> print (reduced_X.shape) 

The following section of the code is very much similar to 2D PCA:

>>> zero_x, zero_y = [],[] ; one_x, one_y = [],[] 
>>> two_x,two_y = [],[]; three_x, three_y = [],[] 
>>> four_x,four_y = [],[]; five_x,five_y = [],[] 
>>> six_x,six_y = [],[]; seven_x,seven_y = [],[] 
>>> eight_x,eight_y = [],[]; nine_x,nine_y = [],[] 
 
# For 2-Dimensional data 
>>> for i in range(len(reduced_X)): 
...     if y[i] == 0: 
...         zero_x.append(reduced_X[i][0]) 
...         zero_y.append(reduced_X[i][1]) 
         
...     elif y[i] == 1: 
...         one_x.append(reduced_X[i][0]) 
...         one_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 2: 
...         two_x.append(reduced_X[i][0]) 
...         two_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 3: 
...         three_x.append(reduced_X[i][0]) 
...         three_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 4: 
...         four_x.append(reduced_X[i][0]) 
...         four_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 5: 
...         five_x.append(reduced_X[i][0]) 
...         five_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 6: 
...         six_x.append(reduced_X[i][0]) 
...         six_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 7: 
...         seven_x.append(reduced_X[i][0]) 
...         seven_y.append(reduced_X[i][1]) 
 
...     elif y[i] == 8: 
...         eight_x.append(reduced_X[i][0]) 
 ...        eight_y.append(reduced_X[i][1]) 
     
 ...    elif y[i] == 9: 
 ...        nine_x.append(reduced_X[i][0]) 
 ...        nine_y.append(reduced_X[i][1]) 
 
 
>>> zero = plt.scatter(zero_x, zero_y, c='r', marker='x',label='zero') 
>>> one = plt.scatter(one_x, one_y, c='g', marker='+') 
>>> two = plt.scatter(two_x, two_y, c='b', marker='s') 
 
>>> three = plt.scatter(three_x, three_y, c='m', marker='*') 
>>> four = plt.scatter(four_x, four_y, c='c', marker='h') 
>>> five = plt.scatter(five_x, five_y, c='r', marker='D') 
 
>>> six = plt.scatter(six_x, six_y, c='y', marker='8') 
>>> seven = plt.scatter(seven_x, seven_y, c='k', marker='*') 
>>> eight = plt.scatter(eight_x, eight_y, c='r', marker='x') 
 
>>> nine = plt.scatter(nine_x, nine_y, c='b', marker='D') 
 
 
>>> plt.legend((zero,one,two,three,four,five,six,seven,eight,nine), 
...  ('zero','one','two','three','four','five','six','seven','eight','nine'), 
...            scatterpoints=1,loc='lower right',ncol=3,fontsize=10) 
 
>>> plt.xlabel('Latent Feature 1',fontsize = 13) 
>>> plt.ylabel('Latent Feature 2',fontsize = 13) 
 
>>> plt.show() 

From the previous plot we can see that data points are well separated, but the issue is the direction of view, as these features do not vary as per the dimensions perpendicular to each other, similar to principal components, which are orthogonal to each other. In the case of deep auto encoders, we need to change the view of direction from the (0, 0) to visualize this non-linear classification, which we will see in detail in the following 3D case.

The following is the code for 3D latent features. All the code remains the same apart from the h3latent_layer, in which we have to replace the value from 2 to 3, as this is the end of encoder section and we will utilize it in creating the latent features and, eventually, it will be used for plotting purposes.

# 3-Dimensional architecture 
>>> input_layer = Input(shape=(64,),name="input") 
 
>>> encoded = Dense(32, activation='relu',name="h1encode")(input_layer) 
>>> encoded = Dense(16, activation='relu',name="h2encode")(encoded) 
>>> encoded = Dense(3, activation='relu',name="h3latent_layer")(encoded) 
 
>>> decoded = Dense(16, activation='relu',name="h4decode")(encoded) 
>>> decoded = Dense(32, activation='relu',name="h5decode")(decoded) 
>>> decoded = Dense(64, activation='sigmoid',name="h6decode")(decoded) 
 
>>> autoencoder = Model(input_layer, decoded) 
autoencoder.compile(optimizer="adam", loss="mse") 
 
# Fitting Encoder-Decoder model 
>>> autoencoder.fit(x_vars_stdscle, x_vars_stdscle, epochs=100,batch_size=256, shuffle=True,validation_split= 0.2) 

From the previous results we can see that, with the inclusion of three dimensions instead of two, loss values obtained are less than in the 2D use case. Train and validation losses for two latent factors after 100 epochs are 0.9061 and 0.7326, and for three latent factors after 100 epochs, are 0.8032 and 0.6464. This signifies that, with the inclusion of one more dimension, we can reduce the errors significantly. This way, the reader can change various parameters to determine the ideal architecture for dimensionality reduction:

# Extracting Encoder section of the Model for prediction of latent variables 
>>> encoder = Model(autoencoder.input,autoencoder.get_layer("h3latent_layer").output) 
 
# Predicting latent variables with extracted Encoder model 
>>> reduced_X3D = encoder.predict(x_vars_stdscle) 
 
>>> zero_x, zero_y,zero_z = [],[],[] ; one_x, one_y,one_z = [],[],[] 
>>> two_x,two_y,two_z = [],[],[]; three_x, three_y,three_z = [],[],[] 
>>> four_x,four_y,four_z = [],[],[]; five_x,five_y,five_z = [],[],[] 
>>> six_x,six_y,six_z = [],[],[]; seven_x,seven_y,seven_z = [],[],[] 
>>> eight_x,eight_y,eight_z = [],[],[]; nine_x,nine_y,nine_z = [],[],[] 
 
>>> for i in range(len(reduced_X3D)): 
     
...     if y[i]==10: 
...         continue 
     
...     elif y[i] == 0: 
...         zero_x.append(reduced_X3D[i][0]) 
...         zero_y.append(reduced_X3D[i][1]) 
...         zero_z.append(reduced_X3D[i][2]) 
         
...     elif y[i] == 1: 
...         one_x.append(reduced_X3D[i][0]) 
...         one_y.append(reduced_X3D[i][1]) 
...         one_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 2: 
...         two_x.append(reduced_X3D[i][0]) 
...         two_y.append(reduced_X3D[i][1]) 
...         two_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 3: 
...         three_x.append(reduced_X3D[i][0]) 
...         three_y.append(reduced_X3D[i][1]) 
...         three_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 4: 
...         four_x.append(reduced_X3D[i][0]) 
...         four_y.append(reduced_X3D[i][1]) 
...         four_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 5: 
...         five_x.append(reduced_X3D[i][0]) 
...         five_y.append(reduced_X3D[i][1]) 
...         five_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 6: 
...         six_x.append(reduced_X3D[i][0]) 
...         six_y.append(reduced_X3D[i][1]) 
...         six_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 7: 
...         seven_x.append(reduced_X3D[i][0]) 
...         seven_y.append(reduced_X3D[i][1]) 
...         seven_z.append(reduced_X3D[i][2]) 
 
...     elif y[i] == 8: 
...         eight_x.append(reduced_X3D[i][0]) 
...         eight_y.append(reduced_X3D[i][1]) 
...         eight_z.append(reduced_X3D[i][2]) 
     
...     elif y[i] == 9: 
...         nine_x.append(reduced_X3D[i][0]) 
...         nine_y.append(reduced_X3D[i][1]) 
...         nine_z.append(reduced_X3D[i][2]) 
 
 
 
# 3- Dimensional plot 
>>> from mpl_toolkits.mplot3d import Axes3D 
>>> fig = plt.figure() 
>>> ax = fig.add_subplot(111, projection='3d') 
 
>>> ax.scatter(zero_x, zero_y,zero_z, c='r', marker='x',label='zero') 
>>> ax.scatter(one_x, one_y,one_z, c='g', marker='+',label='one') 
>>> ax.scatter(two_x, two_y,two_z, c='b', marker='s',label='two') 
 
>>> ax.scatter(three_x, three_y,three_z, c='m', marker='*',label='three') 
>>> ax.scatter(four_x, four_y,four_z, c='c', marker='h',label='four') 
>>> ax.scatter(five_x, five_y,five_z, c='r', marker='D',label='five') 
 
>>> ax.scatter(six_x, six_y,six_z, c='y', marker='8',label='six') 
>>> ax.scatter(seven_x, seven_y,seven_z, c='k', marker='*',label='seven') 
>>> ax.scatter(eight_x, eight_y,eight_z, c='r', marker='x',label='eight') 
 
>>> ax.scatter(nine_x, nine_y,nine_z, c='b', marker='D',label='nine') 
 
>>> ax.set_xlabel('Latent Feature 1',fontsize = 13) 
>>> ax.set_ylabel('Latent Feature 2',fontsize = 13) 
>>> ax.set_zlabel('Latent Feature 3',fontsize = 13) 
 
>>> ax.set_xlim3d(0,60) 
 
>>> plt.legend(loc='upper left', numpoints=1, ncol=3, fontsize=10, bbox_to_anchor=(0, 0)) 
 
>>> plt.show() 

3D plots from deep auto encoders do provide well separated classification compared with three PCAs. Here we have got better separation of the digits. One important point the reader should consider here is that the above plot is the rotated view from (0, 0, 0), as data separation does not happen across orthogonal planes (like PCAs), hence we need to see the view from origin in order to see this non-linear classification.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset