Chapter 1. Toward deep learning: a machine-learning introduction

This chapter covers:

  • Machine learning and its differences from traditional programming
  • Problems that can and can’t be solved with machine learning
  • Machine learning’s relationship to artificial intelligence
  • The structure of a machine-learning system
  • Disciplines of machine learning

As long as computers have existed, programmers have been interested in artificial intelligence (AI): implementing human-like behavior on a computer. Games have long been a popular subject for AI researchers. During the personal computer era, AIs have overtaken humans at checkers, backgammon, chess, and almost all classic board games. But the ancient strategy game Go remained stubbornly out of reach for computers for decades. Then in 2016, Google DeepMind’s AlphaGo AI challenged 14-time world champion Lee Sedol and won four out of five games. The next revision of AlphaGo was completely out of reach for human players: it won 60 straight games, taking down just about every notable Go player in the process.

AlphaGo’s breakthrough was enhancing classical AI algorithms with machine learning. More specifically, AlphaGo used modern techniques known as deep learning—algorithms that can organize raw data into useful layers of abstraction. These techniques aren’t limited to games at all. You’ll also find deep learning in applications for identifying images, understanding speech, translating natural languages, and guiding robots. Mastering the foundations of deep learning will equip you to understand how all these applications work.

Why write a whole book about computer Go? You might suspect that the authors are die-hard Go nuts—OK, guilty as charged. But the real reason to study Go, as opposed to chess or backgammon, is that a strong Go AI requires deep learning. A top-tier chess engine such as Stockfish is full of chess-specific logic; you need a certain amount of knowledge about the game to write something like that. With deep learning, you can teach a computer to imitate strong Go players, even if you don’t understand what they’re doing. And that’s a powerful technique that opens up all kinds of applications, both in games and in the real world.

Chess and checkers AIs are designed around reading out the game further and more accurately than human players can. There are two problems with applying this technique to Go. First, you can’t read far ahead, because the game has too many moves to consider. Second, even if you could read ahead, you don’t know how to evaluate whether the result is good. It turns out that deep learning is the key to unlocking both problems.

This book provides a practical introduction to deep learning by covering the techniques that powered AlphaGo. You don’t need to study the game of Go in much detail to do this; instead, you’ll look at the general principles of the way a machine can learn. This chapter introduces machine learning and the kinds of problems it can (and can’t) solve. You’ll work through examples that illustrate the major branches of machine learning, and see how deep learning has brought machine learning into new domains.

1.1. What is machine learning?

Consider the task of identifying a photo of a friend. This is effortless for most people, even if the photo is badly lit, or your friend got a haircut or is wearing a new shirt. But suppose you want to program a computer to do the same thing. Where would you even begin? This is the kind of problem that machine learning can solve.

Traditionally, computer programming is about applying clear rules to structured data. A human developer programs a computer to execute a set of instructions on data, and out comes the desired result, as shown in figure 1.1. Think of a tax form: every box has a well-defined meaning, and detailed rules indicate how to make various calculations from them. Depending on where you live, these rules may be extremely complicated. It’s easy for people to make a mistake here, but this is exactly the kind of task that computer programs excel at.

Figure 1.1. The standard programming paradigm that most software developers are familiar with. The developer identifies the algorithm and implements the code; the users supply the data.

In contrast to the traditional programming paradigm, machine learning is a family of techniques for inferring a program or algorithm from example data, rather than implementing it directly. So, with machine learning, you still feed your computer data, but instead of imposing instructions and expecting output, you provide the expected output and let the machine find an algorithm by itself.

To build a computer program that can identify who’s in a photo, you can apply an algorithm that analyzes a large collection of images of your friend and generates a function that matches them. If you do this correctly, the generated function will also match new photos that you’ve never seen before. Of course, the program will have no knowledge of its purpose; all it can do is identify things that are similar to the original images you fed it.

In this situation, you call the images you provide the machine training data, and the names of the person on the picture labels. After you’ve trained an algorithm for your purpose, you can use it to predict labels on new data to test it. Figure 1.2 displays this example alongside a schema of the machine-learning paradigm.

Figure 1.2. The machine-learning paradigm: during development, you generate an algorithm from a data set, and then incorporate that into your final application.

Machine learning comes in when rules aren’t clear; it can solve problems of the “I’ll know it when I see it” variety. Instead of programming the function directly, you provide data that indicates what the function should do, and then methodically generate a function that matches your data.

In practice, you usually combine machine learning with traditional programming to build a useful application. For our face-detection app, you have to instruct the computer on how to find, load, and transform the example images before you can apply a machine-learning algorithm. Beyond that, you might use hand-rolled heuristics to separate headshots from photos of sunsets and latte art; then you can apply machine learning to put names to faces. Often a mixture of traditional programming techniques and advanced machine-learning algorithms will be superior to either one alone.

1.1.1. How does machine learning relate to AI?

Artificial intelligence, in the broadest sense, refers to any technique for making computers imitate human behavior. AI includes a huge range of techniques, including the following:

  • Logic production systems, which apply formal logic to evaluate statements
  • Expert systems, in which programmers try to directly encode human knowledge into software
  • Fuzzy logic, which defines algorithms to help computers process imprecise statements

These sorts of rules-based techniques are sometimes called classical AI or GOFAI (good old-fashioned AI).

Machine learning is just one of many fields in AI, but today it’s arguably the most successful one. In particular, the subfield of deep learning is behind some of the most exciting breakthroughs in AI, including tasks that eluded researchers for decades. In classical AI, researchers would study human behavior and try to encode rules that match it. Machine learning and deep learning flip the problem on its head: now you collect examples of human behavior and apply mathematical and statistical techniques to extract the rules.

Deep learning is so ubiquitous that some people in the community use AI and deep learning interchangeably. For clarity, we’ll use AI to refer to the general problem of imitating human behavior with computers, and machine learning or deep learning to refer to mathematical techniques for extracting algorithms from examples.

1.1.2. What you can and can’t do with machine learning

Machine learning is a specialized technique. You wouldn’t use machine learning to update database records or render a user interface. Traditional programming should be preferred in the following situations:

  • Traditional algorithms solve the problem directly. If you can directly write code to solve a problem, it’ll be easier to understand, maintain, test, and debug.
  • You expect perfect accuracy. All complex software contains bugs. But in traditional software engineering, you expect to methodically identify and fix bugs. That’s not always possible with machine learning. You can improve machine-learning systems, but focusing too much on a specific error often makes the overall system worse.
  • Simple heuristics work well. If you can implement a rule that’s good enough with just a few lines of code, do so and be happy. A simple heuristic, implemented clearly, will be easy to understand and maintain. Functions that are implemented with machine learning are opaque and require a separate training process to update. (On the other hand, if you’re maintaining a complicated sequence of heuristics, that’s a good candidate to replace with machine learning.)

Often there’s a fine line between problems that are feasible to solve with traditional programming and problems that are virtually impossible to solve, even with machine learning. Detecting faces in images versus tagging faces with names is just one example we’ve seen. Determining what language a text is written in versus translating that text into a given language is another such example.

We often resort to traditional programming in situations where machine learning might help—for instance, when the complexity of the problem is extremely high. When confronted with highly complex, information-dense scenarios, humans tend to settle for rules of thumb and narratives: think macroeconomics, stock-market predictions, or politics. Process managers and so-called experts can often vastly benefit from enhancing their intuition with insights gained from machine learning. Often, real-world data has more structure than anticipated, and we’re just beginning to harvest the benefits of automation and augmentation in many of these areas.

1.2. Machine learning by example

The goal of machine learning is to construct a function that would be hard to implement directly. You do this by selecting a model, a large family of generic functions. Then you need a procedure for selecting a function from that family that matches your goal; this process is called training or fitting the model. You’ll work through a simple example.

Let’s say you collect the height and weight of some people and plot those values on a graph. Figure 1.3 shows some data points that were pulled from the roster of a professional soccer team.

Figure 1.3. A simple example data set. Each point on the graph represents a soccer player’s height and weight. Your goal is to fit a model to these points.

Suppose you want to describe these points with a mathematical function. First, notice that the points, more or less, make a straight line going up and to the right. If you think back to high school algebra, you may recall that functions of the form f(x) = ax + b describe straight lines. You might suspect that you could find values of a and b so that ax + b matches your data points fairly closely. The values of a and b are the parameters, or weights, that you need to figure out. This is your model. You can write Python code that can generate any function in this family:

class GenericLinearFunction:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def evaluate(self, x):
        return self.a * x + self.b

How would you find out the right values of a and b? You can use rigorous algorithms to do this, but for a quick and dirty solution, you could just draw a line through your graph with a ruler and try to work out its formula. Figure 1.4 shows such a line that follows the general trend of the data set.

Figure 1.4. First you note that your data set roughly follows a linear trend, then you find the formula for a specific line that fits the data.

If you eyeball a couple of points that the line passes through, you can calculate a formula for the line; you’ll get something like f(x) = 4.2x – 137. Now you have a specific function that matches your data. If you measure the height of a new person, you could then use your formula to estimate that person’s weight. It won’t be exactly right, but it may be close enough to be useful. You can turn your GenericLinearFunction into a specific function:

height_to_weight = GenericLinearFunction(a=4.2, b=-137)
height_of_new_person = 73
estimated_weight = height_to_weight.evaluate(height_of_new_person)

This should be a pretty good estimate, so long as your new person is also a professional soccer player. All the people in your data set are adult men, in a fairly narrow age range, who train for the same sport every day. If you try to apply your function to female soccer players, or Olympic weightlifters, or babies, you’ll get wildly inaccurate results. Your function is only as good as your training data.

This is the basic process of machine learning. Here, your model is the family of all functions that look like f(x) = ax + b. And in fact, even something that simple is a useful model that statisticians use all the time. As you tackle more-complex problems, you’ll use more-sophisticated models and more-advanced training techniques. But the core idea is the same: first describe a large family of possible functions and then identify the best function from that family.

Python and machine learning

All the code samples in this book are written in Python. Why Python? First, Python is an expressive high-level language for general application development. In addition, Python is among the most popular languages for machine learning and mathematical programming. This combination makes Python a natural choice for an application that integrates machine learning.

Python is popular for machine learning because of its amazing collection of numerical computing packages. Packages we use in this book include the following:

  • NumPyThis library provides efficient data structures to represent numerical vectors and arrays, and an extensive library of fast mathematical operations. NumPy is the bedrock of Python’s numerical computing ecosystem: every notable library for machine learning or statistics integrates with NumPy.
  • TensorFlow and TheanoThese are two graph computation libraries (graph in the sense of a network of connected steps, not graph as in diagram). They allow you to specify complex sequences of mathematical operations, and then generate highly optimized implementations.
  • KerasThis is a high-level library for deep learning. It provides a convenient way for you to specify neural networks, and relies on TensorFlow or Theano to handle the raw computation.

We wrote the code examples in this book with Keras 2.2 and TensorFlow 1.8 in mind. You should be able to use any Keras version in the 2.x series with minimal modifications.

1.2.1. Using machine learning in software applications

In the previous section, you looked at a purely mathematical model. How can you apply machine learning to a real software application?

Suppose you’re working on a photo-sharing app, in which users have uploaded millions of pictures with tags. You’d like to add a feature that suggests tags for a new photo. This feature is a perfect candidate for machine learning.

First, you have to be specific about the function you’re trying to learn. Say you had a function like this:

def suggest_tags(image_data):
    """Recommend tags for an image.

    Input: image_data is a photo in bitmap format

    Returns: a ranked list of suggested tags
    """

Then the rest of the work is relatively straightforward. But it’s not at all obvious how to start implementing a function like suggest_tags. That’s where machine learning comes in.

If this were an ordinary Python function, you’d expect it to take some kind of Image object as input and perhaps return a list of strings as output. Machine-learning algorithms aren’t so flexible about their inputs and outputs; they generally work on vectors and matrices. So as a first step, you need to represent your input and output mathematically.

If you resize the input photo to a fixed size—say, 128 × 128 pixels—then you can encode it as a matrix with 128 rows and 128 columns: one float value per pixel. What about the output? One option is to restrict the set of tags you’ll identify; you could select perhaps the 1,000 most popular tags on the app. The output could then be a vector of size 1,000, where each element of the vector corresponds to a particular tag. If you allow the output values to vary anywhere between 0 and 1, you can generate ranked lists of suggested tags. Figure 1.5 illustrates this sort of mapping between concepts in your application and mathematical structures.

Figure 1.5. Machine-learning algorithms operate on mathematical structures, such as vectors and matrices. Your photo tags are stored in a standard computer data structure: a list of strings. This is one possible scheme for encoding that list as a mathematical vector.

This data preprocessing step you just carried out is an integral part of every machine-learning system. Usually, you load the data in raw format and carry out preprocessing steps to create features—input data that can be fed into a machine-learning algorithm.

1.2.2. Supervised learning

Next, you need an algorithm for training your model. In this case, you have millions of correct examples already—all the photos that users have already uploaded and manually tagged in your app. You can learn a function that attempts to match these examples as closely as possible, and you hope that it’ll generalize to new photos in a sensible way. This technique is known as supervised learning, so-called because the labels of human-curated examples provide guidance for the training process.

When training is complete, you can deliver the final learned function with your application. Every time a user uploads a new photo, you pass it into the trained model function and get a vector back. You can match each value in the vector back to the tag it represents; then you can select the tags with the largest values and show them to the user. Schematically, the procedure you just outlined can be represented as shown in figure 1.6.

Figure 1.6. A machine-learning pipeline for supervised learning

How do you test your trained model? The standard practice is to set aside some of your original labeled data for that purpose. Before starting training, you can set aside a chunk of your data, say 10%, as a validation set. The validation set isn’t included as part of the training data in any way. Then you can apply your trained model to the images in the validation set and compare the suggested tags to the known good tags. This lets you compute the accuracy of your model. If you want to experiment with different models, you have a consistent metric for measuring which is better.

In game AI, you can extract labeled training data from records of human games. And online gaming is a huge boon for machine learning: when people play a game online, the game server may save a computer-readable record. Examples of how to apply supervised learning to games are as follows:

  • Given a collection of complete records of chess games, represent the game state in vector or matrix form and learn to predict the next move from data.
  • Given a board position, learn to predict the likelihood of winning for that state.

1.2.3. Unsupervised learning

In contrast to supervised learning, the subfield of machine learning called unsupervised learning doesn’t come with any labels to guide the learning process. In unsupervised learning, the algorithm has to learn to find patterns in the input data on its own. The only difference from figure 1.6 is that you’re missing the labels, so you can’t evaluate your predictions the way you did before. All other components stay the same.

An example of this is outlier detection—identifying data points that don’t fit with the general trend of the data set. In the soccer player data set, outliers would indicate players who don’t match the typical physique of their teammates. For instance, you could come up with an algorithm that measures the distance of a height-width pair to the line you eyeballed. If a data point exceeds a certain distance to the average line, you declare it an outlier.

In board-game AI, a natural question to ask is which pieces on the board belong together or form a group. In the next chapter, you’ll see what this means for the game of Go in more detail. Finding groups of pieces that have a relationship is sometimes called clustering or chunking. Figure 1.7 shows an example of what this could look like for chess.

Figure 1.7. An unsupervised machine-learning pipeline for finding clusters or chunks of chess pieces

1.2.4. Reinforcement learning

Supervised learning is powerful, but finding quality training data can be a major obstacle. Suppose you’re building a house-cleaning robot. The robot has various sensors that can detect when it’s near obstacles, and motors that let it scoot around the floor and steer left or right. You need a control system: a function that can analyze the sensor input and decide how it should move. But supervised learning is impossible here. You have no examples to use as training data—your robot doesn’t even exist yet.

Instead, you can apply reinforcement learning, a sort of trial-and-error approach. You start with an inefficient or inaccurate control system, and then you let the robot attempt its task. During the task, you record all the inputs your control system sees and the decisions it makes. When it’s done, you need a way to evaluate how well it did, perhaps by calculating the fraction of the floor it vacuumed and how far it drained its battery. That whole experience gives you a small chunk of training data, and you can use it to improve the control system. By repeating the whole process over and over, you can gradually home in on an efficient control function. Figure 1.8 shows this process as a flowchart.

Figure 1.8. In reinforcement learning, agents learn to interact with their environment by trial and error. You repeatedly have your agent attempt its task to get a supervised signal to learn from. With every cycle, you can make an incremental improvement.

1.3. Deep learning

This book is made up of sentences. The sentences are made of words; the words are made of letters; the letters are made of lines and curves; and, ultimately, those lines and curves are made of tiny dots of ink. When teaching a child to read, you start with the smallest parts and work your way up: first letters, then words, then sentences, and finally complete books. (Normally, children learn to recognize lines and curves on their own.) This kind of hierarchy is the natural way for people to learn complex concepts. At each level, you ignore some detail, and the concepts become more abstract.

Deep learning applies the same idea to machine learning. Deep learning is a subfield of machine learning that uses a specific family of models: sequences of simple functions chained together. These chains of functions are known as neural networks because they were loosely inspired by the structure of natural brains. The core idea of deep learning is that these sequences of functions can analyze a complex concept as a hierarchy of simpler ones. The first layer of a deep model can learn to take raw data and organize it in basic ways—for example, grouping dots into lines. Each successive layer organizes the previous layer into more-advanced and more-abstract concepts. The process of learning these abstract concepts is called representation learning.

The amazing thing about deep learning is that you don’t need to know what the intermediate concepts are in advance. If you select a model with enough layers and provide enough training data, the training process will gradually organize the raw data into increasingly high-level concepts. But how does the training algorithm know what concepts to use? It doesn’t; it just organizes the input in any way that helps it to better match the training examples. There’s no guarantee that this representation matches the way humans would think about the data. Figure 1.9 shows how representation learning fits into the supervised learning flow.

Figure 1.9. Deep learning and representation learning

All this power comes with a cost. Deep models have huge numbers of weights to learn. Recall the simple ax + b model you used for your height and weight data set; that model had just two weights to learn. A deep model suitable for your image-tagging app could have a million weights. As a result, deep learning demands larger data sets, more computing power, and a more hands-on approach to training. Both techniques have their place. Deep learning is a good choice in the following circumstances:

  • Your data is in an unstructured form. Images, audio, and written language are good candidates for deep learning. It’s possible to apply simple models to that kind of data, but it generally requires sophisticated preprocessing.
  • You have large amounts of data available or have a plan for acquiring more. In general, the more complex your model is, the more data you need to train it.
  • You have plenty of computing power or plenty of time. Deep models involve more calculation for both training and evaluation.

You should prefer traditional models with fewer parameters in the following cases:

  • You have structured data. If your inputs look more like database records, you can often apply simple models directly.
  • You want a descriptive model. With simple models, you can look at the final learned function and examine how an individual input affects the output. This can give you insight about how the real-world system you’re studying works. In deep models, the connection between a specific piece of the input and the final output is long and winding; it’s difficult to interpret the model.

Because deep learning refers to the type of model you use, you can apply deep learning to any of the major machine-learning branches. For example, you can do supervised learning with a deep model or a simple model, depending on the type of training data you have.

1.4. What you’ll learn in this book

This book provides a practical introduction to deep learning and reinforcement learning. To get the most out of this book, you should be comfortable reading and writing Python code, and have some familiarity with linear algebra and calculus. In this book, we teach the following:

  • How to design, train, and test neural networks by using the Keras deep-learning library
  • How to set up supervised deep-learning problems
  • How to set up reinforcement-learning problems
  • How to integrate deep learning with a useful application

Throughout the book, we use a concrete and fun example: building an AI that plays Go. Our Go bot combines deep learning with standard computer algorithms. We’ll use straightforward Python to enforce the rules of the game, track the game state, and look ahead through possible game sequences. Deep learning will help the bot identify which moves are worth examining and evaluate who’s ahead during a game. At each stage, you can play against your bot and watch it improve as you apply more-sophisticated techniques.

If you’re interested in Go specifically, you can use the bot you’ll build in the book as a starting point for experimenting with your own ideas. You can adapt the same techniques to other games. You’ll also be able to add features powered by deep learning to other applications beyond games.

1.5. Summary

  • Machine learning is a family of techniques for generating functions from data instead of writing them directly. You can use machine learning to solve problems that are too ambiguous to solve directly.
  • Machine learning generally involves first choosing a model—a generic family of mathematical functions. Next you train the model—apply an algorithm to find the best function in that family. Much of the art of machine learning lies in selecting the right model and transforming your particular data set to work with it.
  • Three of the major areas of machine learning are supervised learning, unsupervised learning, and reinforcement learning.
  • Supervised learning involves learning a function from examples you already know to be correct. When you have examples of human behavior or knowledge available, you can apply supervised learning to imitate them on a computer.
  • Unsupervised learning involves extracting structure from data without knowing what the structure is in advance. A common application is splitting a data set into logical groups.
  • Reinforcement learning involves learning a function through trial and error. If you can write code to evaluate how well a program achieves a goal, you can apply reinforcement learning to incrementally improve a program over many trials.
  • Deep learning is machine learning with a particular type of model that performs well on unstructured inputs, such as images or written text. It’s one of the most exciting fields in computer science today; it’s constantly expanding our ideas about what computers can do.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset