Encoder-decoder with attention

The encoder-decoder architecture described in the previous section has one major shortcoming. As the final encoder state is of a fixed length, it can lead to loss of information. While this may not be a problem for short phrases, for longer source language inputs, the encoder may not be able to capture long-term dependencies. This leads to the decoder not being able to output good translations. To overcome this problem, the attention mechanism was introduced by Bahdanau et al. in their paper Neural Machine Translation by Jointly Learning to Align and Translate. The following diagram is an illustration of the architecture as taken from their paper:

The main idea behind attention is to focus or pay attention to important parts of the input source text while learning to translate. The attention mechanism, in effect, builds shortcut connections between the input source text and target text through weights that are learned during training. These connections enhance the ability of the decoder to translate longer phrases in the input, thereby leading to more accurate translations.

Table of Contents for Encoder-decoder with attention

Create new playlist

Sign In

Sign Up

Table of Contents for
Encoder-decoder with attention