Appendix A. Mathematical foundations

You can’t do machine learning without math. In particular, linear algebra and calculus are essential. The goal of this appendix is to provide enough mathematical background to help you understand the code samples in the book. We don’t have nearly enough space to cover these massive topics thoroughly; if you want to understand these subjects better, we provide some suggestions for further reading.

If you’re already familiar with advanced machine-learning techniques, you can safely skip this appendix altogether.

Further reading

In this book, we have room to cover only a few mathematical basics. If you’re interested in learning more about the mathematical foundations of machine learning, here are some suggestions:

  • For a thorough treatment of linear algebra, we suggest Sheldon Axler’s Linear Algebra Done Right (Springer, 2015).
  • For a complete and practical guide to calculus, including vector calculus, we like James Stewart’s Calculus: Early Transcendentals (Cengage Learning, 2015).
  • If you’re serious about understanding the mathematical theory of how and why calculus works, it’s hard to beat Walter Rudin’s classic Principles of Mathematical Analysis (McGraw Hill, 1976).

Vectors, matrices, and beyond: a linear algebra primer

Linear algebra provides tools for handling arrays of data known as vectors, matrices, and tensors. You can represent all of these objects in Python with NumPy’s array type.

Linear algebra is fundamental to machine learning. This section covers only the most basic operations, with a focus on how to implement them in NumPy.

Vectors: one-dimensional data

A vector is a one-dimensional array of numbers. The size of the array is the dimension of the vector. You use NumPy arrays to represent vectors in Python code.

Note

This isn’t the true mathematical definition of a vector, but for the purposes of our book, it’s close enough.

You can convert a list of numbers into a NumPy array with the np.array function. The shape attribute lets you check the dimension:

>>> import numpy as np
>>> x = np.array([1, 2])
>>> x
array([1, 2])
>>> x.shape
(2,)
>>> y = np.array([3, 3.1, 3.2, 3.3])
>>> y
array([3. , 3.1, 3.2, 3.3])
>>> y.shape
(4,)

Note that shape is always a tuple; this is because arrays can be multidimensional, as you’ll see in the next section.

You can access individual elements of a vector, just as if it were a Python list:

>>> x = np.array([5, 6, 7, 8])
>>> x[0]
5
>>> x[1]
6

Vectors support a few basic algebraic operations. You can add two vectors of the same dimension. The result is a third vector of the same dimension. Each element of the sum vector is the sum of the matching elements in the original vectors:

>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([5, 6, 7, 8])
>>> x + y
array([ 6,  8, 10, 12])

Similarly, you can also multiply two vectors element-wise with the * operator. (Here, element-wise means you multiply each pair of corresponding elements separately.)

>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([5, 6, 7, 8])
>>> x * y
array([ 5, 12, 21, 32])

The element-wise product is also called the Hadamard product.

You can also multiply a vector with a single float (or scalar). In this case, you multiply each value in the vector by the scalar:

>>> x = np.array([1, 2, 3, 4])
>>> 0.5 * x
array([0.5, 1. , 1.5, 2. ])

Vectors support a third kind of multiplication, the dot product, or inner product. To compute the dot product, you multiply each pair of corresponding elements and sum the results. So the dot product of two vectors is a single float. The NumPy function np.dot calculates the dot product. In Python 3.5 and later, the @ operator does the same thing. (In this book, we use np.dot.)

>>> x = np.array([1, 2, 3, 4])
>>> y = np.array([4, 5, 6, 7])
>>> np.dot(x, y)
60
>>> x @ y
60

Matrices: two-dimensional data

A two-dimensional array of numbers is called a matrix. You can also represent matrices with NumPy arrays. In this case, if you pass a list of lists into np.array, you get a two-dimensional matrix back:

>>> x = np.array([
  [1, 2, 3],
  [4, 5, 6]
 ])
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.shape
(2, 3)

Note that the shape of a matrix is a two-element tuple: first is the number of rows, and second is the number of columns. You can access single elements with a double-subscript notation: first row, then column. Alternately, NumPy lets you pass in the indices in a [row, column] format. Both are equivalent:

>>> x = np.array([
  [1, 2, 3],
  [4, 5, 6]
 ])
>>> x[0][1]
2
>>> x[0, 1]
2
>>> x[1][0]
4
>>> x[1, 0]
4

You can also pull out a whole row from a matrix and get a vector:

>>> x = np.array([
  [1, 2, 3],
  [4, 5, 6]
 ])
>>> y = x[0]
>>> y
array([1, 2, 3])
>>> y.shape
(3,)

To pull out a column, you can use the funny-looking notation [:, n]. If it helps, think of : as Python’s list-slicing operator; so [:, n] means “get me all rows, but only column n.” Here’s an example:

>>> x = np.array([
  [1, 2, 3],
  [4, 5, 6]
 ])
>>> z = x[:, 1]
>>> z
array([2, 5])

Just like vectors, matrices support element-wise addition, element-wise multiplication, and scalar multiplication:

>>> x = np.array([
  [1, 2, 3],
  [4, 5, 6]
 ])
>>> y = np.array([
  [3, 4, 5],
  [6, 7, 8]
 ])
>>> x + y
array([[ 4,  6,  8],
       [10, 12, 14]])
>>> x * y
array([[ 3,  8, 15],
       [24, 35, 48]])
>>> 0.5 * x
array([[0.5, 1. , 1.5],
       [2. , 2.5, 3. ]])

Rank 3 tensors

Go is played on a grid; so are chess, checkers, and a variety of other classic games. Any point on the grid can contain one of a variety of different game pieces. How do you represent the contents of the board as a mathematical object? One solution is to represent the board as a stack of matrices, and each matrix is the size of the game board.

Each individual matrix in the stack is called a plane, or channel. Each channel can represent a single type of piece that can be on the game board. In Go, you might have one channel for black stones and a second channel for white stones; figure A.1 shows an example. In chess, maybe you have a channel for pawns, another channel for bishops, one for knights, and so forth. You can represent the whole stack of matrices as a single three-dimensional array; this is called a rank 3 tensor.

Figure A.1. Representing a Go game board with a two-plane tensor. This is a 5 × 5 board. You use one channel for black stones and a separate channel for white stones. So you use a 2 × 5 × 5 tensor to represent the board.

Another common case is representing an image. Let’s say you want to represent a 128 × 64 pixel image with a NumPy array. In that case, you start with a grid corresponding to the pixels in the image. In computer graphics, you typically break a color into red, green, and blue components. So you can represent that image with a 3 × 128 × 64 tensor: you have a red channel, a green channel, and a blue channel.

Once again, you can use np.array to construct a tensor. The shape will be a tuple with three components, and you can use subscripting to pull out individual channels:

>>> x = np.array([
 [[1, 2, 3],
  [2, 3, 4]],
 [[3, 4, 5],
  [4, 5, 6]]
])
>>> x.shape
(2, 2, 3)
>>> x[0]
array([[1, 2, 3],
       [2, 3, 4]])
>>> x[1]
array([[3, 4, 5],
       [4, 5, 6]])

As with vectors and matrices, tensors support element-wise addition, element-wise multiplication, and scalar multiplication.

If you have an 8 × 8 grid with three channels, you could represent it with a 3 × 8 × 8 tensor or an 8 × 8 × 3 tensor. The only difference is in the way you index it. When you process the tensors with library functions, you must make sure the functions are aware of which indexing scheme you chose. The Keras library, which you use for designing neural networks, calls these two options channels_first and channels_last. For the most part, the choice doesn’t matter: you just need to pick one and stick to it consistently. In this book, we use the channels_first format.

Note

If you need a motivation to pick a format, certain NVIDIA GPUs have special optimizations for the channels_first format.

Rank 4 tensors

In many places in the book, we use a rank 3 tensor to represent a game board. For efficiency, you may want to pass many game boards to a function at once. One solution is to pack the board tensors into a four-dimensional NumPy array: this is a tensor of rank 4. You can think of this four-dimensional array as a list of rank 3 tensors, each of which represents a single board.

Matrices and vectors are just special cases of tensors: a matrix is a rank 2 tensor, and a vector is a rank 1 tensor. And a rank 0 tensor is a plain old number.

Rank 4 tensors are the highest-order tensors you’ll see in this book, but NumPy can handle tensors of any rank. Visualizing high-dimensional tensors is hard, but the algebra is the same.

Calculus in five minutes: derivatives and finding maxima

In calculus, the rate of change of a function is called its derivative. Table A.1 lists a few real-world examples.

Table A.1. Examples of derivatives

Quantity

Derivative

How far you’ve traveled How fast you moved
How much water is in a tub How fast the water drained out
How many customers you have How many customers you gain (or lose)

A derivative is not a fixed quantity: it’s another function that varies over time or space. On a trip in a car, you drive faster or slower at various times. But your speed is always connected to the distance you cover. If you had a precise record of where you were over time, you could go back and work out how fast you were traveling at any point in the trip. That is the derivative.

When a function is increasing, its derivative is positive. When a function is decreasing, its derivative is negative. Figure A.2 illustrates this concept. With this knowledge, you can use the derivative to find a local maximum or a local minimum. Any place the derivative is positive, you can move to the right a little bit and find a larger value. If you go past the maximum, the function must now be decreasing, and its derivative is negative. In that case, you want to move a little bit to the left. At the local maximum, the derivative will be exactly zero. The logic for finding a local minimum is identical, except you move in the opposite direction.

Figure A.2. A function and its derivative. Where the derivative is positive, the function is increasing. Where the derivative is negative, the function is decreasing. When the derivative is exactly zero, the function is at a local minimum or maximum. With this logic, you can use the derivative to find local minima or maxima.

Many functions that show up in machine learning take a high-dimensional vector as input and compute a single number as output. You can extend the same idea to maximize or minimize such a function. The derivative of such a function is a vector of the same dimension as its input, called a gradient. For every element of the gradient, the sign tells you which direction to move that coordinate. Following the gradient to maximize a function is called gradient ascent; if you’re minimizing, the technique is called gradient descent.

In this case, it may help to imagine the function as a contoured surface. At any point, the gradient points to the steepest slope of the surface.

To use gradient ascent, you must have a formula for the derivative of the function you’re trying to maximize. Most simple algebraic functions have a known derivative; you can look them up in any calculus textbook. If you define a complicated function by chaining many simple functions together, a formula known as the chain rule describes how to calculate the derivative of the complicated function. Libraries like TensorFlow and Theano take advantage of the chain rule to automatically calculate the derivative of complicated functions. If you define a complicated function in Keras, you don’t need to figure out the formula for the gradient yourself: Keras will hand off the work to TensorFlow or Theano.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset