LSTM networks

One type of RNN model is an LSTM. The precise implementation details of LSTM are not within the scope of this book. An LSTM is a special RNN architecture, which was originally conceived by Hochreiter and Schmidhuber in 1997. This type of neural network has been recently rediscovered in the context of deep learning, because it is free from the problem of vanishing gradients, and offers excellent results and performance. LSTM-based networks are ideal for prediction and classification of temporal sequences, and are replacing many traditional approaches to deep learning.

It's a hilarious name, but it means exactly what it sounds. The name signifies that short-term patterns aren't forgotten in the long-term. An LSTM network is composed of cells (LSTM blocks) linked to each other. Each LSTM block contains three types of gate: input gate, output gate, and forget gate, respectively, that implement the functions of writing, reading, and resetting the cell memory. These gates are not binary, but analogical (generally managed by a sigmoidal activation function mapped in the range (0, 1), where 0 indicates total inhibition, and 1 shows total activation).

If you consider the LSTM cell as a black box, it can be used very much like a basic cell, except it will perform much better; training will converge faster, and it will detect long-term dependencies in the data. So how does an LSTM cell work? The architecture of a basic LSTM cell is shown in Figure 7:

Figure 7: Block diagram of an LSTM cell

Now, let's see the mathematical notation behind this architecture. If we don't look at what's inside the LSTM box, the LSTM cell itself looks exactly like a regular memory cell, except that its state is split into two vectors, h(t) and c(t):

c is a cell
h(t) is the short-term state
c(t) is the long-term state

Now let's open the box! The key idea is that the network can learn what to store in the long-term state, what to throw away, and what to read from it. As the long-term state c_(t-1) traverses the network from left to right, you can see that it first goes through a forget gate, dropping some memories, and then it adds some new memories via the addition operation (which adds the memories that were selected by an input gate). The resulting c(t) is sent straight out, without any further transformation.

So, at each timestamp, some memories are dropped and some memories are added. Moreover, after the addition operation, the long-term state is copied and passed through the tanh function, and then the result is filtered by the output gate. This produces the short-term state h(t) (which is equal to the cell's output for this time step y(t)). Now let's look at where new memories come from and how the gates work. First, the current input vector x(t) and the previous short-term state h(t-1) are fed to four different fully connected layers.

The presence of these gates allows LSTM cells to remember information for an indefinite time; if the input gate is below the activation threshold, the cell will retain the previous state, and if the current state is enabled, it will be combined with the input value. As the name suggests, the forget gate resets the current state of the cell (when its value is cleared to 0), and the output gate decides whether the value of the cell must be carried out or not. The following equations are used to do the LSTM computations of a cell's long-term state, its short-term state, and its output at each time step for a single instance:

In the preceding equation, W_xi, W_xf, W_xo, and W_xg are the weight matrices of each of the four layers for their connection to the input vector x_(t). On the other hand, W_hi, W_hf, W_ho, and W_hg are the weight matrices of each of the four layers for their connection to the previous short-term state h_(t-1). Finally, b_i, b_f, b_o, and b_g are the bias terms for each of the four layers.

Now that we know all that, how do both RNN and the LSTM network work? It's time to do some hands-on. We will start implementing an MXNet and Scala-based LSTM model for HAR.

Table of Contents for LSTM networks

Create new playlist

Sign In

Sign Up

Table of Contents for
LSTM networks