Chapter 16. Modeling Sequential Data Using Recurrent Neural Networks

In the previous chapter, we focused on Convolutional Neural Networks (CNNs) for image classification. In this chapter, we will explore Recurrent Neural Networks (RNNs) and see their application in modeling sequential data and a specific subset of sequential data—time-series data. As an overview, in this chapter, we will cover the following topics:

  • Introducing sequential data
  • RNNs for modeling sequences
  • Long Short-Term Memory (LSTM)
  • Truncated Backpropagation Through Time (T-BPTT)
  • Implementing a multilayer RNN for sequence modeling in TensorFlow
  • Project one – RNN sentiment analysis of the IMDb movie review dataset
  • Project two – RNN character-level language modeling with LSTM cells, using text data from Shakespeare's Hamlet
  • Using gradient clipping to avoid exploding gradients

Since this chapter is the last in our Python Machine Learning journey, we'll conclude with a summary of what we've learned about RNNs, and an overview of all the machine learning and deep learning topics that led us to RNNs across the journey of the book. We'll then sign off by sharing with you links to some of our favorite people and initiatives in this wonderful field so that you can continue your journey into machine learning and deep learning.

Introducing sequential data

Let's begin our discussion of RNNs by looking at the nature of sequential data, more commonly known as sequences. We'll take a look at the unique properties of sequences that make them different from other kinds of data. We'll then see how we can represent sequential data, and explore the various categories of models for sequential data, which are based on the input and output of a model. This will help us explore the relationship between RNNs and sequences a little bit later on in the chapter.

Modeling sequential data – order matters

What makes sequences unique, from other data types, is that elements in a sequence appear in a certain order, and are not independent of each other.

If you recall from Chapter 6, Learning Best Practices for Model Evaluation and Hyperparameter Tuning, we discussed that typical machine learning algorithms for supervised learning assume that the input data is Independent and Identically Distributed (IID). For example, if we have n data samples, Modeling sequential data – order matters, the order in which we use the data for training our machine learning algorithm does not matter.

However, this assumption is not valid anymore when we deal with sequences—by definition, order matters.

Representing sequences

We've established that sequences are a nonindependent order in our input data; we next need to find ways to leverage this valuable information in our machine learning model.

Throughout this chapter, we will represent sequences as Representing sequences. The superscript indices indicate the order of the instances, and the length of the sequence is T. For a sensible example of sequences, consider time-series data, where each sample point Representing sequences belongs to a particular time t.

The following figure shows an example of time-series data where both x's and y's naturally follow the order according to their time axis; therefore, both x's and y's are sequences:

Representing sequences

The standard neural network models that we have covered so far, such as MLPs and CNNs, are not capable of handling the order of input samples. Intuitively, one can say that such models do not have a memory of the past seen samples. For instance, the samples are passed through the feedforward and backpropagation steps, and the weights are updated independent of the order in which the sample is processed.

RNNs, by contrast, are designed for modeling sequences and are capable of remembering past information and processing new events accordingly.

The different categories of sequence modeling

Sequence modeling has many fascinating applications, such as language translation (perhaps from English to German), image captioning, and text generation.

However, we need to understand the different types of sequence modeling tasks to develop an appropriate model. The following figure, based on the explanations in the excellent article The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy (http://karpathy.github.io/2015/05/21/rnn-effectiveness/), shows several different relationship categories of input and output data:

The different categories of sequence modeling

So, let's consider the input and output data here. If neither the input or output data represents sequences, then we are dealing with standard data, and we can use any of the previous methods to model such data. But if either the input or output is a sequence, the data will form one of the following three different categories:

  • Many-to-one: The input data is a sequence, but the output is a fixed-size vector, not a sequence. For example, in sentiment analysis, the input is text-based and the output is a class label.
  • One-to-many: The input data is in standard format, not a sequence, but the output is a sequence. An example of this category is image captioning—the input is an image; the output is an English phrase.
  • Many-to-many: Both the input and output arrays are sequences. This category can be further divided based on whether the input and output are synchronized or not. An example of a synchronized many-to-many modeling task is video classification, where each frame in a video is labeled. An example of a delayed many-to-many would be translating a language into another. For instance, an entire English sentence must be read and processed by a machine before producing its translation into German.

Now, since we know about the categories of sequence modeling, we can move forward to discuss the structure of an RNN.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset