Encoder

An encoder is basically an RNN with LSTM or GRU cells. It can also be a bidirectional RNN. We feed the input sentence to an encoder and, instead of taking the output, we take the hidden state from the final time step as the embeddings. Let's better understand encoders with an example.

Consider we are using an RNN with a GRU cell and the input sentence is what are you doing. Let's represent the hidden state of the encoder with e:

The preceding diagram shows how the encoder computes the thought vectors; this is explained as follows:

In the first time step, . To a GRU cell, we pass the input, , which is the first word in the input sentence, what, and also the initial hidden state, , which is randomly initialized. With these inputs, the GRU cell computes the first hidden state, , as follows:

In the next time step, , we pass the input, , which is the next word in the input sentence, are, to the encoder. Along with this, we also pass the previous hidden state, , and compute the hidden state, :

In the next time step, , we pass the input, , which is the next word, you, to the encoder. Along with this, we also pass the previous hidden state, , and compute the hidden state, , as follows:

In the final time step, , we feed the input, , which is the last word in the input sentence, doing. Along with this, we also pass the previous hidden state, , and compute the hidden state, :

Thus, is our final hidden state. We learned that the RNN captures the context of all the words it has seen so far in its hidden state. Since is the final hidden state, it holds the context of all the words that the network has seen, which will be all the words in our input sentence, that is, what, are, you, and doing.

Since the final hidden state, , holds the context of all the words in our input sentence, it holds the context of the input sentence, and this essentially forms our embedding, , which is otherwise called a thought or context vector, as follows: