Decoding predictions

Suppose that we would like to classify into five classes; we would compute a five-dimensional vector, . This five-dimensional vector could be interpreted as the probability of representing each class. What this means is that the i-th component of s provides the probability, or the score, for the  i class, given the word, w.

This dense vector can be computed in TensorFlow as follows:

# Weights matrix initialised with the default initialiser
W = tf.get_variable("W", shape=[2 * hidden_state_size, ntags], dtype=tf.float32)

# Bias vector initialised using a zero initialiser
b = tf.get_variable("b", shape=[ntags], dtype=tf.float32, initializer=tf.zeros_initializer())

# Getting number of time steps
num_time_steps = tf.shape(semantic_representation)[1]

# Using flattened (using reshape) vector to calculate prediction scores
prediction = tf.matmul(semantic_representatio_flatten, W) + b

Given that we can compute a dense vector that provides the score that a given word belongs to an i class, there are two methods that we can use to make our prediction.

The first method is the obvious approach of using softmax activation to normalize the scores into a vector, . This method uses a non-linear activation, where the elements of the vector are calculated as follows:

 

The second method is a smarter way of labeling words, by using neighboring tags that are made available through an approach likely similar to the first step. For example, if we consider the words New Delhi, the NER system could potentially classify New as the beginning of a location, based on the fact that it classified the word nearby, Delhi, as a location. This method, also known as a linear-chain conditional random field (CRF), defines a global score that takes into consideration the cost of beginning or ending with a given tag.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset