Getting ready

The architecture that we will be defining to perform machine translation is as follows:

Take a labeled dataset where the input sentence and the corresponding translation in French is available
Tokenize and extract words that are frequent in each of the English and French texts:
- To identify the frequent words, we will count the frequency of each word
- The words that constitute the top 80% of total cumulative frequency of all words are considered the frequent words
For all the words that are not among the frequent words, replace them with an unknown (unk) symbol
Assign an ID to each word
Build an encoder LSTM that fetches the vector of the input text
Pass the encoded vector through dense layer so that we extract the probabilities of decoded text at each time step
Fit a model to minimize the loss at the output

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Getting ready