Word embeddings

As discussed in the chapter on word embeddings, we would like to build a dense vector that is capable of capturing the semantic meaning of the context in which the word is being used. In this task, however, we will build our word embeddings as a concatenation of pretrained embeddings extracted at the word level and trained embeddings from the character level. Hence, the word embedding, , is composed of a pretrained word-level vector, , and a trained character-level vector, .

Although it is possible to encode the character-level vector as a one-hot encoding or use any other hand-crafted feature, such a feature might not be easily scalable to other datasets and languages. A robust method for encoding the character-level embedding is to learn it directly from the data. In this chapter, we will utilize a bidirectional LSTM to learn such an embedding:

Hence, every character, ci, in a word, w = [c1, c2,..., ck], has an associated vector, . It has to be noted that we do not perform any preprocessing on the data to remove punctuation or change the words to lowercase, as such characters have an impact on the meaning conveyed by a word in a given position. For instance, the word Apple in the sentenceThey had to get an Apple product to complete their tech eco-systemrefers to the organization (ORG), while the word apple in the sentence, They had to get an apple a day to keep the doctor away, refers to the fruit. The distinction in the words can be easily identified when the uppercase usage of the word is considered. The character-level embedding vector tries to learn the structures, words, to arrive at the final forms in which they are being used. In other words, a character-level vector learns the morphology of the word that it represents.

Finally, we will concatenate the character embedding, , with the word embedding, , to obtain the word representation, , with d = d1 + d2. Now that we have prepared our input, we will proceed to walk through the code to build our input and the model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset