Updating the cell state

We just learned how all three gates work in an LSTM network, but the question is, how can we actually update the cell state by adding relevant new information and deleting information that is not required from the cell state with the help of the gates?

First, we will see how to add new relevant information to the cell state. To hold all the new information that can be added to the cell state (memory), we create a new vector called . It is called a candidate state or internal state vector. Unlike gates that are regulated by the sigmoid function, the candidate state is regulated by the tanh function, but why? The sigmoid function returns values in the range of o to 1, that is, it is always positive. We need to allow the values of to be either positive or negative. So, we use the tanh function, which returns values in the range of -1 to +1.

The candidate state, , at time is expressed as follows:

Here, the following applies:

is the input-to-hidden weights of the candidate state
is the hidden-to-hidden weights of the candidate state
is the bias of the candidate state

Thus, the candidate state holds all the new information that can be added to the cell state (memory). The following diagram shows the candidate state:

How do we decide whether the information in the candidate state is relevant? How do we decide whether to add new information or not from the candidate state to the cell state? We learned that the input gate is responsible for deciding whether to add new information or not, so if we multiply and , we only get relevant information that should be added to the memory.

That is, we know the input gate returns 0 if the information is not required and 1 if the information is required. Say , then multiplying and gives us 0, which means the information in is not required and we don't want to update the cell state with . When , then multiplying and gives us , which implies we can update the information in to the cell state.

Adding the new information to the cell state with input gate and candidate state is shown in the following diagram:

Now, we will see how to remove information from the previous cell state that is no longer required.

We learned that the forget gate is used for removing information that is not required in the cell state. So, if we multiply the previous cell state, , and forget gate, , then we retain only relevant information in the cell state.

Say , then multiplying and gives us 0, which means the information in cell state, , is not required and should be removed (forgotten). When , then multiplying and gives , which implies that information in the previous cell state is required and should not be removed.

Removing information from the previous cell state, , with the forget gate, , is shown in the following diagram:

Thus, in a nutshell, we update our cell state by multiplying and to add new information, and multiplying ,and to remove information. We can express the cell state equation as follows:

Table of Contents for Updating the cell state

Create new playlist

Sign In

Sign Up

Table of Contents for
Updating the cell state