Autoencoders, or neural networks for dimensionality reduction

A little bit more than a decade ago, the main tool for dimensionality reduction with neural networks was Kohonen maps, or self-organizing maps (SOM). They were neural networks that would map data in a discrete, 1D-embedded space. Since then, with faster computers, it is now possible to use deep learning to create embedded spaces.

The trick is to have an intermediate layer that has fewer nodes than the input layer and an output layer that must reproduce the input layer. The data on this intermediate layer will give us the coordinates in an embedded space.

If we use regular dense layers without a specific activation function, we get a linear function from the input to the embedded layer to the output layer. More than one layer to the embedded layer will not change the result of the training and, as such, we get a linear embedding such as PCA (without the constraint of having an orthogonal basis in the embedded layer).

Adding a nonlinear activation function to the dense layer is what will enable finding manifolds in data, instead of just hyperplans. As opposed to tools such as Isomap, which try to match distances between data (it's a variant of MDS, trying to match approximated geodesic distances instead of Euclidian distances), or Laplacian eigenmaps, which try to match similarities between data, autoencoders have no concept of what we are trying to keep—they will just attempt to reproduce whatever we provide at the input.

Neural networks can extract features from data, as we will see in the TensorFlow chapter, but we will keep things simple here by using a dataset that is features-only.

The dataset we will be considering is the Swiss Roll. It is one of the most famous datasets used in manifold, as it's a nonlinear dataset that can be easily understood by human eyes, but that is wrapped enough to make it difficult for algorithms to properly describe it:

import numpy as np
max = 4
def generate_swissroll(n):
    """
    Generates data for the swissroll
    Returns the parameter space, the swissroll
    """
    orig = np.random.random((2, n)) * max
    return (orig.T, np.array((orig[1] * np.cos(orig[1]),
                              orig[1] * np.sin(orig[1]), 
                              orig[0])).T)

def color_from_parameters(params):
    """
    Defines a color scheme for the swissroll
    """
    return np.array((params[:,0], params[:,1], max - params[:,1])).T / max

From these functions, we can generate new data along with a color code that will allow us to check that the embedded data matches the original parameters we used, as shown in the following diagram:

It is now time to think of the architecture we will use. We will start with the input layer that will feed the data inside the network using two layers that will do the heavy lifting of unwrapping the input data into the embedded layer with two layers. To rebuild the Swiss Roll, we will use another dense layer before ending on the three-unit output layer. To create the nonlinearities, each of the layers (except the input) will use a leaky_relu activation. The arrangement is shown in the following diagram:

Let's create the scaffolding:

import tensorflow as tf

def tf_create_variables():
    swissroll_tf = tf.placeholder(tf.float32, (None, 3), name="swissroll")
    return swissroll_tf
    
def tf_create_dense_layer(x, size):
    return tf.layers.dense(x, size, activation=tf.nn.leaky_relu,         
       kernel_initializer=tf.contrib.layers.xavier_initializer())

This time, the autoencoder will be encapsulated in a class. The constructor will create the variables, and the train method will run the optimization, as well as create a few display images.

When we build the layers, we save the embedded layer variable, as this variable is the one we want to use to get the parameters of a new sample in the embedded space:

class Autoencoder(object):
    def __init__(self, swissroll, swissroll_test, nb_intermediate, 
                 learning_rate):
        self.swissroll = swissroll
        self.swissroll_test = swissroll_test
        
        self.swissroll_tf = tf_create_variables()
        
        intermediate_input = tf_create_dense_layer(self.swissroll_tf, 
                                                   nb_intermediate)
        intermediate_input = tf_create_dense_layer(intermediate_input, 
                                                   nb_intermediate)
        self.encoded = tf_create_dense_layer(intermediate_input, 2)
        intermediate_output = tf_create_dense_layer(self.encoded, 
                                                    nb_intermediate)
        self.output = tf_create_dense_layer(intermediate_output, 3)
    
        self.meansq = tf.reduce_mean(tf.squared_difference(
                                    self.output, self.swissroll_tf))
        self.train_step = tf.train
                            .GradientDescentOptimizer(learning_rate)
                            .minimize(self.meansq)
    
    def train(self, display, n_epochs, batch_size, **kwargs):
        n = len(self.swissroll)
        
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            
            for i in range(n_epochs):
                permut = np.random.permutation(n)
                for j in range(0, n, batch_size):
                    samples = permut[j:j+batch_size]
                    batch = self.swissroll[samples]
                    sess.run(self.train_step, 
                             feed_dict={self.swissroll_tf: batch})

                if i % step == step - 1:
                    print("Epoch :%i
 Loss %f" %
                          (i, sess.run(self.meansq, 
                          feed_dict={self.swissroll_tf: self.swissroll})))
    
            error = sess.run(self.meansq, 
                         feed_dict={self.swissroll_tf: self.swissroll})
            error_test = sess.run(self.meansq, 
                         feed_dict={self.swissroll_tf: self.swissroll_test})
            
            if display:
                pred = sess.run(self.encoded, 
                         feed_dict={self.swissroll_tf : self.swissroll})
                pred = np.asarray(pred)
                recons = sess.run(self.output, 
                         feed_dict={self.swissroll_tf : self.swissroll})
                recons = np.asarray(recons)
                recons_test = sess.run(self.output, 
                         feed_dict={self.swissroll_tf : self.swissroll_test})
                recons_test = np.asarray(recons_test)
            
                print("Embedded manifold")
                plot_2d(pred, colors)
                save_png("swissroll_embedded")
                plt.show()
                print("Reconstructed manifold")
                plot_3d(recons, colors)
                save_png("swissroll_reconstructed")
                plt.show()
                print("Reconstructed test manifold")
                plot_3d(recons_test, kwargs['colors_test'])
                save_png("swissroll_test")
                plt.show()
        return error, error_test

We can run this autoencoder and check whether it also works on new data:

n = 5000
n_epochs = 2000
batch_size = 100
nb_intermediate = 20
learning_rate = 0.05
step = 100
params, swissroll = generate_swissroll(n)
params_test, swissroll_test = generate_swissroll(n)
colors = color_from_parameters(params)
colors_test = color_from_parameters(params_test)

model = Autoencoder(swissroll, swissroll_test, 
                    nb_intermediate, learning_rate)

error, error_test = model.train(True, n_epochs, batch_size, 
                                colors=colors, test=swissroll_test, 
                                colors_test = colors_test)
…
Epoch :1599
  Loss 0.001498
Epoch :1699
  Loss 0.001008
Epoch :1799
  Loss 0.000870
Epoch :1899
  Loss 0.000952
Epoch :1999
  Loss 0.000830

The embedded space for the training data is fine, and respects the color scheme we used for generating the swissroll. We can see a representation of it in the following diagram:

An interesting point here is that the parameter space is not directly linked to the parameters we used to create the data. The amplitude is different, and each new run will get a new embedded space. We could add a regularization in the mean square cost function, just like we did in the chapter on regression.

A crucial point is to check that the output data also matches the input data. We saw that the loss was very low. The test data also showed that the reconstruction error was low, but a visual check is sometimes a good thing to do. The following diagram shows a graphical representation:

We can see that there are some bumps and discontinuities compared to the original Swiss Roll. Adding a second layer during reconstruction would help reduce this; we didn't do it here to show that we don't have to use a symmetric neural network for autoencoders.

Table of Contents for Autoencoders, or neural networks for dimensionality reduction

Create new playlist

Sign In

Sign Up

Table of Contents for
Autoencoders, or neural networks for dimensionality reduction