In the previous chapter, we focused on Convolutional Neural Networks (CNNs) for image classification. In this chapter, we will explore Recurrent Neural Networks (RNNs) and see their application in modeling sequential data and a specific subset of sequential data—time-series data. As an overview, in this chapter, we will cover the following topics:
Since this chapter is the last in our Python Machine Learning journey, we'll conclude with a summary of what we've learned about RNNs, and an overview of all the machine learning and deep learning topics that led us to RNNs across the journey of the book. We'll then sign off by sharing with you links to some of our favorite people and initiatives in this wonderful field so that you can continue your journey into machine learning and deep learning.
Let's begin our discussion of RNNs by looking at the nature of sequential data, more commonly known as sequences. We'll take a look at the unique properties of sequences that make them different from other kinds of data. We'll then see how we can represent sequential data, and explore the various categories of models for sequential data, which are based on the input and output of a model. This will help us explore the relationship between RNNs and sequences a little bit later on in the chapter.
What makes sequences unique, from other data types, is that elements in a sequence appear in a certain order, and are not independent of each other.
If you recall from Chapter 6, Learning Best Practices for Model Evaluation and Hyperparameter Tuning, we discussed that typical machine learning algorithms for supervised learning assume that the input data is Independent and Identically Distributed (IID). For example, if we have n data samples, , the order in which we use the data for training our machine learning algorithm does not matter.
However, this assumption is not valid anymore when we deal with sequences—by definition, order matters.
We've established that sequences are a nonindependent order in our input data; we next need to find ways to leverage this valuable information in our machine learning model.
Throughout this chapter, we will represent sequences as . The superscript indices indicate the order of the instances, and the length of the sequence is T. For a sensible example of sequences, consider time-series data, where each sample point belongs to a particular time t.
The following figure shows an example of time-series data where both x's and y's naturally follow the order according to their time axis; therefore, both x's and y's are sequences:
The standard neural network models that we have covered so far, such as MLPs and CNNs, are not capable of handling the order of input samples. Intuitively, one can say that such models do not have a memory of the past seen samples. For instance, the samples are passed through the feedforward and backpropagation steps, and the weights are updated independent of the order in which the sample is processed.
RNNs, by contrast, are designed for modeling sequences and are capable of remembering past information and processing new events accordingly.
Sequence modeling has many fascinating applications, such as language translation (perhaps from English to German), image captioning, and text generation.
However, we need to understand the different types of sequence modeling tasks to develop an appropriate model. The following figure, based on the explanations in the excellent article The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy (http://karpathy.github.io/2015/05/21/rnn-effectiveness/), shows several different relationship categories of input and output data:
So, let's consider the input and output data here. If neither the input or output data represents sequences, then we are dealing with standard data, and we can use any of the previous methods to model such data. But if either the input or output is a sequence, the data will form one of the following three different categories:
Now, since we know about the categories of sequence modeling, we can move forward to discuss the structure of an RNN.