RNNs – sentiment analysis context

Now, let's recap the basic concepts of RNNs and also talk about them in the context of the sentiment analysis application. As we mentioned in the RNN chapter, the basic building block of a RNN is a recurrent unit, as shown in this figure:

Figure 2: An abstract idea of an RNN unit

This figure is an abstraction of what goes on inside the recurrent unit. What we have here is the input, so this would be a word, for example, good. Of course, it has to be converted to embedding vectors. However, we will ignore that for now. Also, this unit has a kind of memory state, and depending on the contents of this State and the Input, we will update this state and write new data into the state. For example, imagine that we have previously seen the word not in the input; we write that to the state so that when we see the word good on one of the following inputs, we know from the state that we have just seen the word not. Now, we see the word good. Thus, we have to write into the state that we have seen the words not good together so that this might indicate that the whole input text probably has a negative sentiment.

The mapping from the old state and the input to the new contents of the state is done through a so-called Gate, and the way these are implemented differs across different versions of recurrent units. It is basically a matrix operation with an activation function, but as we will see in a moment, there is a problem with backpropagating gradients. So, the RNN has to be designed in a special way so that the gradients are not distorted too much.

In a recurrent unit, we have a similar gate for producing the output, and once again the output of the recurrent unit depends on the current contents of the state and the input that we are seeing. So what we can try and do is unroll the processing that takes place with a recurrent unit:

Figure 3: Unrolled version of the recurrent neural net

Now, what we have here is just one recurrent unit, but the flow chart shows what happens at different time steps. So:

  • In time step 1, we input the word this to the recurrent unit and it has its internal memory state first initialized to zero. This is done by TensorFlow whenever we start processing a new sequence of data. So, we see the word this and the recurrent unit state is 0. Hence, we use the internal gate to update the memory state and this is then used in time step number two where we input the word is; now, the memory state has some contents. There's not a whole lot of meaning in the word this, so the state might still be around 0.
  • And there's also not a lot of meaning in is, so perhaps the state is still somewhat 0.
  • In the next time step, we see the word not, and this has meaning we ultimately want to predict, which is the sentiment of the whole input text. This one is what we need to store in the memory so that the gate inside the recurrent unit sees that the state already probably contains near-zero values. But now it wants to store what we have just seen the word not, so it saves some nonzero value in this state.
  • Then, we move on to the next time step, where we have the word a; this also doesn't have much information, so it's probably just ignored. It just copies over the state.
  • Now, we have the word very, and this indicates that whatever sentiment exists might be a strong sentiment, so the recurrent unit now knows that we have seen not and very. It stores this somehow in its memory state.
  • In the next time step, we see the word good, so now the network knows not very good and it thinks, Oh, this is probably a negative sentiment! Hence, it stores that value in the internal state.
  • Then, in the final time step, we see movie, and this is not really relevant, so it's probably just ignored.
  • Next, we use the other gate inside the recurrent unit to output the contents of the memory state, and then it is processed with the sigmoid function (which we don't show here). We get an output value between 0 and 1.

The idea then is that we want to train this network on many many thousands of examples of movie reviews from the Internet Movie database, where, for each input text, we give it the true sentiment value of either positive or negative. Then, we want TensorFlow to find out what the gates inside the recurrent unit should be so that they accurately map this input text to the correct sentiment:

Figure 4: Used architecture for this chapter's implementation

The architecture for the RNN we will be using in this implementation is an RNN-type architecture with three layers. In the first layer, what we've just explained happens, except that now we need to output the value from the recurrent unit at each time step. Then, we gather a new sequence of data, which is the output of the first recurrent layer. Next, we can input it to the second recurrent layer because recurrent units need sequences of input data (and the output that we got from the first layer and the one that we want to feed into the second recurrent layer are some floating-point values whose meanings we don't really understand). This has a meaning inside the RNN, but it's not something we as humans will understand. Then, we do similar processing in the second recurrent layer.

So, first, we initialize the internal memory state of this recurrent unit to 0; then, we take the first output from the first recurrent layer and input it. We process it with the gates inside this recurrent unit, update the state, take the output of the first layer's recurrent unit for the second word is, and use that as input as well as the internal memory state. We continue doing this until we have processed the whole sequence, and then we gather up all the outputs of the second recurrent layer. We use them as inputs in the third recurrent layer, where we do a similar processing. But here, we only want the output for the last time step, which is a kind of summary for everything that has been fed so far. We then output that to a fully connected layer that we don't show here. Finally, we have the sigmoid activation function, so we get a value between zero and one, which represents negative and positive sentiments, respectively.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset