Chapter 7. Other Important Deep Learning Libraries

In this chapter, we'll talk about other deep learning libraries, especially libraries with programming languages other than Java. The following are the most famous, well-developed libraries:

  • Theano
  • TensorFlow
  • Caffe

You'll briefly learn about each of them. Since we'll mainly implement them using Python here, you can skip this chapter if you are not a Python developer. All the libraries introduced in this chapter support GPU implementations and have other special features, so let's dig into them.

Theano

Theano was developed for deep learning, but it is not actually a deep learning library; it is a Python library for scientific computing. The documentation is available at http://deeplearning.net/software/theano/. There are several characteristics introduced on the page such as the use of a GPU, but the most striking feature is that Theano supports computational differentiation or automatic differentiation, which ND4J, the Java scientific computing library, doesn't support. This means that, with Theano, we don't have to calculate the gradients of model parameters by ourselves. Theano automatically does this instead. Since Theano undertakes the most complicated parts of the algorithm, implementations of math expressions can be less difficult.

Let's see how Theano computes gradients. To begin with, we need to install Theano on the machine. Installation can be done just by using pip install Theano or easy_install Theano. Then, the following are the lines to import and use Theano:

import theano
import theano.tensor as T

With Theano, all variables are processed as tensors. For example, we have scalar, vector, and matrix, d for double, l for long, and so on. Generic functions such as sin, cos, log, and exp are also defined under theano.tensor. Therefore, as shown previously, we often use the alias of tensor, T.

As a first step to briefly grasp Theano implementations, consider the very simple parabola curve. The implementation is saved in DLWJ/src/resources/theano/1_1_parabola_scalar.py so that you can reference it. First, we define x as follows:

x = T.dscalar('x')

This definition is unique with Python because x doesn't have a value; it's just a symbol. In this case, x is scalar of the type d (double). Then we can define y and its gradient very intuitively. The implementation is as follows:

y = x ** 2
dy = T.grad(y, x)

So, dy should have 2x within it. Let's check whether we can get the correct answers. What we need to do additionally is to register the math function with Theano:

f = theano.function([x], dy)

Then you can easily compute the value of the gradients:

print f(1)  # => 2.0
print f(2)  # => 4.0

Very simple! This is the power of Theano. We have x of scalar here, but you can easily implement vector (and matrix) calculations as well just by defining x as:

x = T.dvector('x')
y = T.sum(x ** 2)

We won't go further here, but you can find the completed codes in DLWJ/src/resources/theano/1_2_parabola_vector.py and DLWJ/src/resources/theano/1_3_parabola_matrix.py.

When we consider implementing deep learning algorithms with Theano, we can find some very good examples on GitHub in Deep Learning Tutorials (https://github.com/lisa-lab/DeepLearningTutorials). In this chapter, we'll look at an overview of the standard MLP implementation so you understand more about Theano. The forked repository as a snapshot is available at https://github.com/yusugomori/DeepLearningTutorials. First, let's take a look at mlp.py. The model parameters of the hidden layer are the weight and bias:

W = theano.shared(value=W_values, name='W', borrow=True)
b = theano.shared(value=b_values, name='b', borrow=True)

Both parameters are defined using theano.shared so that they can be accessed and updated through the model. The activation can be represented as follows:

Theano

This denotes the activation function, that is, the hyperbolic tangent in this code. Therefore, the corresponding code is written as follows:

lin_output = T.dot(input, self.W) + self.b
self.output = (
    lin_output if activation is None
    else activation(lin_output)
)

Here, linear activation is also supported. Likewise, parameters W and b of the output layer, that is, logistic regression layer, are defined and initialized in logistic_sgd.py:

self.W = theano.shared(
    value=numpy.zeros(
        (n_in, n_out),
        dtype=theano.config.floatX
    ),
    name='W',
    borrow=True
)

self.b = theano.shared(
    value=numpy.zeros(
        (n_out,),
        dtype=theano.config.floatX
    ),
    name='b',
    borrow=True
)

The activation function of multi-class logistic regression is the softmax function and we can just write and define the output as follows:

self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

We can write the predicted values as:

self.y_pred = T.argmax(self.p_y_given_x, axis=1)

In terms of training, since the equations of the backpropagation algorithm are computed from the loss function and its gradient, what we need to do is just define the function to be minimized, that is, the negative log likelihood function:

def negative_log_likelihood(self, y):
    return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

Here, the mean values, not the sum, are computed to evaluate across the mini-batch.

With these preceding values and definitions, we can implement MLP. Here again, what we need to do is define the equations and symbols of MLP. The following is an extraction of the code:

class MLP(object):
    def __init__(self, rng, input, n_in, n_hidden, n_out):
        # self.hiddenLayer = HiddenLayer(...)
        # self.logRegressionLayer = LogisticRegression(...)

        # L1 norm
        self.L1 = (
             abs(self.hiddenLayer.W).sum()
             + abs(self.logRegressionLayer.W).sum()
        )

        # square of L2 norm
        self.L2_sqr = (
           (self.hiddenLayer.W ** 2).sum()
            + (self.logRegressionLayer.W ** 2).sum()
        )

        # negative log likelihood of MLP
        self.negative_log_likelihood = (
           self.logRegressionLayer.negative_log_likelihood
        )

        # the parameters of the model
        self.params = self.hiddenLayer.params + self.logRegressionLayer.params

Then you can build and train the model. Let's look at the code in test_mlp(). Once you load the dataset and construct MLP, you can evaluate the model by defining the cost:

cost = (
    classifier.negative_log_likelihood(y)
    + L1_reg * classifier.L1
    + L2_reg * classifier.L2_sqr
)

With this cost, we get the gradients of the model parameters with just a single line of code:

gparams = [T.grad(cost, param) for param in classifier.params]

The following is the equation to update the parameters:

updates = [
    (param, param - learning_rate * gparam)
    for param, gparam in zip(classifier.params, gparams)
]

The code in the first bracket follows this equation:

Theano

Then, finally, we define the actual function for the training:

train_model = theano.function(
    inputs=[index],
    outputs=cost,
    updates=updates,
    givens={
        x: train_set_x[index * batch_size: (index + 1) * batch_size],
        y: train_set_y[index * batch_size: (index + 1) * batch_size]
    }
)

Each indexed input and label corresponds to x, y in givens, so when index is given, the parameters are updated with updates. Therefore, we can train the model with iterations of training epochs and mini-batches:

while (epoch < n_epochs) and (not done_looping):
    epoch = epoch + 1
        for minibatch_index in xrange(n_train_batches):
           minibatch_avg_cost = train_model(minibatch_index)

The original code has the test and validation part, but what we just mentioned is the rudimentary structure. With Theano, equations of gradients will no longer be derived.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset