What is an autoencoder?

An autoencoder is an interesting unsupervised learning algorithm. Unlike other neural networks, the objective of the autoencoder is to reconstruct the given input; that is, the output of the autoencoders is the same as the input. It consists of two important components called the encoder and the decoder.

The role of the encoder is to encode the input by learning the latent representation of the input, and the role of the decoder is to reconstruct the input from the latent representation produced by the encoder. The latent representation is also called bottleneck or code. As shown in the following diagram, an image is passed as an input to the autoencoder. An encoder takes the image and learns the latent representation of the image. The decoder takes the latent representation and tries to reconstruct the image:

A simple vanilla autoencoder with two layers is shown in the following diagram; as you may notice, it consists of an input layer, a hidden layer, and an output layer. First, we feed the input to the input layer, and then the encoder learns the representation of the input and maps it to the bottleneck. From the bottleneck, the decoder reconstructs the input:

We might wonder what the use of this is. Why do we need to encode and decode the inputs? Why do we just have to reconstruct the input? Well, there are various applications such as dimensionality reduction, data compression, image denoising, and more.

Since the autoencoder reconstructs the inputs, the number of nodes in the input and output layers are always the same. Let's assume we have a dataset that contains 100 input features, and we have a neural network with an input layer of 100 units, a hidden layer of 50 units, and an output layer with 100 units. When we feed the dataset to the autoencoder, the encoder tries to learn the important features in the dataset and reduces the number of features to 50 and forms the bottleneck. The bottleneck holds the representations of the data, that is, the embeddings of the data, and encompasses only the necessary information. Then, the bottleneck is fed to the decoder to reconstruct the original input. If the decoder reconstructs the original input successfully, then it means that the encoder has successfully learned the encodings or representations of the given input. That is, the encoder has successfully encoded or compressed the dataset of 100 features into a representation with only 50 features by capturing the necessary information.

So, essentially the encoder tries to learn to reduce the dimensionality of the data without losing the useful information. We can think of autoencoders as similar to dimensionality reduction techniques such as Principal Component Analysis (PCA). In PCA, we project the data into a low dimension using linear transformation and remove the features that are not required. The difference between PCA and an autoencoder is that PCA uses linear transformation for dimensionality reduction while the autoencoder uses a nonlinear transformation.

Apart from dimensionality reduction, autoencoders are also widely used for denoising noise in the images, audio, and so on. We know that the encoder in the autoencoder reduces the dimensionality of the dataset by learning only the necessary information and forms the bottleneck or code. Thus, when the noisy image is fed as an input to the autoencoder, the encoder learns only the necessary information of the image and forms the bottleneck. Since the encoder learns only the important and necessary information to represent the image, it learns that noise is unwanted information and removes the representations of noise from the bottleneck.

Thus, now we will have a bottleneck, that is, a representation of the image without any noise information. When this learned representation of the encoder, that is, the bottleneck, is fed to the decoder, the decoder reconstructs the input image from the encodings produced by the encoder. Since the encodings have no noise, the reconstructed image will not contain any noise.

In a nutshell, autoencoders map our data of a high dimension data to a low-level representation. This low-level data representation of data is called as latent representations or bottleneck which have only meaningful and important features that represent the input.

Since the role of our autoencoder is to reconstruct its input, we use a reconstruction error as our loss function, which implies we try to understand how much of the input is properly reconstructed by the decoder. So, we can use mean squared error loss as our loss function to quantify the performance of autoencoders.

Now that we have understood what autoencoders are, we will explore the architecture of autoencoders in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset