Word embedding

Now that we have defined our input placeholders, we will define a TensorFlow Variable to hold our pretrained embeddings for the vocabularies in the data. The approach that we will follow, in this case, is an indexed array, which can fetch the embedding corresponding to a word index at i as pre_trained_embedding[i]. Since we would like to look up from an embedding matrix, we will load the pretrained embedding array into the TensorFlow variable. The TensorFlow code is defined as follows:

# Define a lookup table using the pre-trained embedding matrix
lookup_word_mat = tf.Variable(embedding_matrix, dtype=tf.float, trainable=False)

# Define an embedding lookup using TensorFlow's in-built function
pre_trained_embedding = tf.nn.embedding_lookup(lookup_word_mat, word_indices)

In the preceding code block, we defined the TensorFlow variable with trainable = False, as we do not want the algorithm to train our embeddings any further. This block defines the vector representation at the word level.

We will build the character-level vector representation in a similar method, using two placeholders, as follows:

# Define input placeholder for character indices of shape = (batch size, maximum length of sentence, maximum length of a word)
char_indices = tf.placeholder(tf.int32, shape=[None, None, None])

# Placeholder for the word lengths of shape = (batch size, maximum length of a sentence)
word_lengths = tf.placeholder(tf.int32, shape=[None, None])

As you can see, we chose to dynamically set the sentence and word lengths, based on the data available in the batch that is being processed during every iteration. Having defined how we will be building our word embeddings using characters, we will now look at how to train the embeddings.

Unlike in the earlier case, we do not have any pretrained embeddings at the character-level to look up in an embedding matrix. Hence, we will initialize the character embeddings by using random vectors, as follows:

# Define a variable lookup with default initialiser 
lookup_char_mat = tf.get_variable(name="character_embeddings", dtype=tf.float32, shape=[num_characters, dim_character])

# Define a character embedding lookup using TensorFlow's in-built function
character_embedding = tf.nn.embedding_lookup(lookup_char_mat, char_indices)

As shown in the previous figure, we will define a bidirectional LSTM that takes the characters from the data available in each batch as input:

# Define the LSTM to accept characters for forward RNN
bi_dir_cell_fw = tf.contrib.rnn.LSTMCell(char_hidden_dim, state_is_tuple=True)

# Define the LSTM to accept characters for backward RNN
bi_dir_cell_bw = tf.contrib.rnn.LSTMCell(char_hidden_dim, state_is_tuple=True)

# Define the bidirectional LSTM which takes both the forward and backward RNNs, the inputs and the sequence lengths
_, ((_, out_fw), (_, out_bw)) = tf.nn.bidirectional_dynamic_rnn(bi_dir_cell_fw, bi_dir_cell_bw, character_embedding, sequence_length=word_lengths, dtype=tf.float32)

As we only need the output vectors, we will not store the rest of the return statements provided by the bidirectional_dynamic_rnn function. To derive the output, we will concatenate the forward and backward output produced, resulting in an output dimension twice the size of the original character's hidden dimension size. Consequently, we will obtain the representations for the characters as well as the words, by reshaping the output of the concatenation:

# Concatenate the two output vectors of the forward and backward RNNs resulting in a vector of shape = (batch size x sentence, 2 x char_hidden_dim)
output_fw_bw = tf.concat([out_fw, out_bw], axis = -1)

# Obtain character embedding by reshaping the output vector in the previous step resulting in a vector of shape = (batch size, sentence length, 2 x char_hidden_dim)
char_vector = tf.reshape(output_fw_bw, shape=[-1, tf.shape(character_embedding)[1], 2*char_hidden_dim])

# Final word embedding is obtained by concatenating the pre-trained word embedding and the character embedding obtained from the bidirectional LSTM
word_embedding = tf.concat([pre_trained_embedding, char_vector], axis = -1)

Table of Contents for Word embedding

Create new playlist

Sign In

Sign Up

Table of Contents for
Word embedding