The architecture that we will be defining to perform machine translation is as follows:
- Take a labeled dataset where the input sentence and the corresponding translation in French is available
- Tokenize and extract words that are frequent in each of the English and French texts:
- To identify the frequent words, we will count the frequency of each word
- The words that constitute the top 80% of total cumulative frequency of all words are considered the frequent words
- For all the words that are not among the frequent words, replace them with an unknown (unk) symbol
- Assign an ID to each word
- Build an encoder LSTM that fetches the vector of the input text
- Pass the encoded vector through dense layer so that we extract the probabilities of decoded text at each time step
- Fit a model to minimize the loss at the output