Training neural networks efficiently with high-level TensorFlow APIs

In this section, we will take a look at two high-level TensorFlow APIs—the Layers API (tensorflow.layers or tf.layers) and the Keras API (tensorflow.contrib.keras).

Keras can be installed as a separate package. It supports Theano or TensorFlow as backend (for more information, refer to the official website of Keras at https://keras.io/).

However, after the release of TensorFlow 1.1.0, Keras has been added to the TensorFlow contrib submodule. It is very likely that the Keras subpackage will be moved outside the experimental contrib submodule and become one of the main TensorFlow submodules soon.

Building multilayer neural networks using TensorFlow's Layers API

To see what neural network training via the tensorflow.layers (tf.layers) high-level API looks like, let's implement a multilayer perceptron to classify the handwritten digits from the MNIST dataset, which we introduced in the previous chapter. The MNIST dataset can be downloaded from http://yann.lecun.com/exdb/mnist/ in four parts, as listed here:

  • Training set images: train-images-idx3-ubyte.gz (9.5 MB)
  • Training set labels: train-labels-idx1-ubyte.gz (32 KB)
  • Test set images: t10k-images-idx3-ubyte.gz (1.6 MB)
  • Test set labels: t10k-labels-idx1-ubyte.gz (8.0 KB)

Note

Note that TensorFlow also provides the same dataset as follows:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

However, we work with the MNIST dataset as an external dataset to learn all the steps of data preprocessing separately. This way, you would learn what you need to do with your own dataset.

After downloading and unzipping the archives, we place the files in the mnist directory in our current working directory so that we can load the training as well as the test dataset, using the load_mnist(path, kind) function we implemented previously in Chapter 12, Implementing a Multilayer Artificial Neural Network from Scratch.

Then, the dataset will be loaded as follows:

>>> ## loading the data
>>> X_train, y_train = load_mnist('./mnist/', kind='train')
>>> print('Rows: %d,  Columns: %d' %(X_train.shape[0],
...                                  X_train.shape[1]))
Rows: 60000,  Columns: 784
>>> X_test, y_test = load_mnist('./mnist/', kind='t10k')
>>> print('Rows: %d,  Columns: %d' %(X_test.shape[0],
...                                  X_test.shape[1]))
Rows: 10000,  Columns: 784
>>> ## mean centering and normalization:
>>> mean_vals = np.mean(X_train, axis=0)
>>> std_val = np.std(X_train)
>>>
>>> X_train_centered = (X_train - mean_vals)/std_val
>>> X_test_centered = (X_test - mean_vals)/std_val
>>>
>>> del X_train, X_test
>>>
>>> print(X_train_centered.shape, y_train.shape)
(60000, 784) (60000,)
>>> print(X_test_centered.shape, y_test.shape)
(10000, 784) (10000,)

Now we can start building our model. We will start by creating two placeholders, named tf_x and tf_y, and then build a multilayer perceptron as in Chapter 12, Implementing a Multilayer Artificial Neural Network from Scratch, but with three fully connected layers.

However, we will replace the logistic units in the hidden layer with hyperbolic tangent activation functions (tanh), replace the logistic function in the output layer with softmax, and add an additional hidden layer.

Note

The tanh and softmax functions are new activation functions. We will learn more about these activation functions in the next section: Choosing activation functions for multilayer neural networks.

import tensorflow as tf

n_features = X_train_centered.shape[1]
n_classes = 10
random_seed = 123
np.random.seed(random_seed)

g = tf.Graph()
with g.as_default():
    tf.set_random_seed(random_seed)
    tf_x = tf.placeholder(dtype=tf.float32,
                       shape=(None, n_features),
                       name='tf_x')

    tf_y = tf.placeholder(dtype=tf.int32,
                        shape=None, name='tf_y')
    y_onehot = tf.one_hot(indices=tf_y, depth=n_classes)

    h1 = tf.layers.dense(inputs=tf_x, units=50,
                         activation=tf.tanh,
                         name='layer1')

    h2 = tf.layers.dense(inputs=h1, units=50,
                         activation=tf.tanh,
                         name='layer2')

    logits = tf.layers.dense(inputs=h2,
                             units=10,
                             activation=None,
                             name='layer3')

    predictions = {
        'classes' : tf.argmax(logits, axis=1,
                              name='predicted_classes'),
        'probabilities' : tf.nn.softmax(logits,
                              name='softmax_tensor')
    }

Next, we define the cost functions and add an operator for initializing the model variables as well as an optimization operator:

## define cost function and optimizer:
with g.as_default():
    cost = tf.losses.softmax_cross_entropy(
            onehot_labels=y_onehot, logits=logits)

    optimizer = tf.train.GradientDescentOptimizer(
            learning_rate=0.001)

    train_op = optimizer.minimize(
            loss=cost)

    init_op = tf.global_variables_initializer()

Before we start training the network, we need a way to generate batches of data. For this, we implement the following function that returns a generator:

def create_batch_generator(X, y, batch_size=128, shuffle=False):
    X_copy = np.array(X)
    y_copy = np.array(y)
    
    if shuffle:
        data = np.column_stack((X_copy, y_copy))
        np.random.shuffle(data)
        X_copy = data[:, :-1]
        y_copy = data[:, -1].astype(int)
    
    for i in range(0, X.shape[0], batch_size):
        yield (X_copy[i:i+batch_size, :], y_copy[i:i+batch_size])

Next, we can create a new TensorFlow session, initialize all the variables in our network, and train it. We also display the average training loss after each epoch monitors the learning process later:

>>> ## create a session to launch the graph
>>> sess =  tf.Session(graph=g)
>>> ## run the variable initialization operator
>>> sess.run(init_op)
>>>
>>> ## 50 epochs of training:
>>> for epoch in range(50):
...     training_costs = []
...     batch_generator = create_batch_generator(
...             X_train_centered, y_train,
...             batch_size=64, shuffle=True)
...     for batch_X, batch_y in batch_generator:
...         ## prepare a dict to feed data to our network:
...         feed = {tf_x:batch_X, tf_y:batch_y}
...         _, batch_cost = sess.run([train_op, cost], feed_dict=feed)
...         training_costs.append(batch_cost)
...     print(' -- Epoch %2d  '
...           'Avg. Training Loss: %.4f' % (
...               epoch+1, np.mean(training_costs)
...     ))


 -- Epoch  1  Avg. Training Loss: 1.5573
 -- Epoch  2  Avg. Training Loss: 1.2532
 -- Epoch  3  Avg. Training Loss: 1.0854
 -- Epoch  4  Avg. Training Loss: 0.9738

 […]
 -- Epoch 49  Avg. Training Loss: 0.3527
 -- Epoch 50  Avg. Training Loss: 0.3498

The training process may take a couple of minutes. Finally, we can use the trained model to do predictions on the test dataset:

>>> ## do prediction on the test set:
>>> feed = {tf_x : X_test_centered}
>>> y_pred = sess.run(predictions['classes'],
...                   feed_dict=feed)
>>>
>>> print('Test Accuracy: %.2f%%' % (
...       100*np.sum(y_pred == y_test)/y_test.shape[0]))

Test Accuracy: 93.89%

We can see that by leveraging high-level APIs, we can quickly build a model and test it. Therefore, a high-level API is very useful for prototyping our ideas and quickly checking the results.

Next, we will develop a similar classification model for MNIST using Keras, which is another high-level TensorFlow API.

Developing a multilayer neural network with Keras

The development of Keras started in the early months of 2015. As of today, it has evolved into one of the most popular and widely used libraries that is built on top of Theano and TensorFlow.

Similar to TensorFlow, the Keras allows us to utilize our GPUs to accelerate neural network training. One of its prominent features is that it has a very intuitive and user-friendly API, which allows us to implement neural networks in only a few lines of code.

Keras was first released as a standalone API that could leverage Theano as a backend, and the support for TensorFlow was added later. Keras is also integrated into TensorFlow from version 1.1.0. Therefore, if you have TensorFlow version 1.1.0, no more installation is needed for Keras. For more information about Keras, visit the official website at http://keras.io.

Currently, Keras is part of the contrib module (which contains packages developed by contributors to TensorFlow and is considered experimental code). In future releases of TensorFlow, it may be moved to become a separate module in the TensorFlow main API. For more information, visit the documentation on the TensorFlow website at https://www.tensorflow.org/api_docs/python/tf/contrib/keras.

Note

Note that you may have to change the code from import tensorflow.contrib.keras as keras to import tensorflow.keras as keras in future versions of TensorFlow in the following code examples.

On the following pages, we will walk through the code examples for using Keras step by step. Using the same functions described in the previous section, we need to load the data as follows:

>>> X_train, y_train = load_mnist('mnist/', kind='train')
>>> print('Rows: %d,  Columns: %d' %(X_train.shape[0],
...                                  X_train.shape[1]))
>>> X_test, y_test = load_mnist('mnist/', kind='t10k')
>>> print('Rows: %d,  Columns: %d' %(X_test.shape[0],
...                                  X_test.shape[1]))
Rows: 10000,  Columns: 784
>>>
>>> ## mean centering and normalization:
>>> mean_vals = np.mean(X_train, axis=0)
>>> std_val = np.std(X_train)
>>>
>>> X_train_centered = (X_train - mean_vals)/std_val
>>> X_test_centered = (X_test - mean_vals)/std_val
>>>
>>> del X_train, X_test
>>>
>>> print(X_train_centered.shape, y_train.shape)
(60000, 784) (60000,)
>>> print(X_test_centered.shape, y_test.shape)
(10000, 784) (10000,)

First, let's set the random seed for NumPy and TensorFlow so that we get consistent results:

>>> import tensorflow as tf
>>> import tensorflow.contrib.keras as keras
# in Tf >= 1.4, use
# >>> import tensorflow.keras as keras
# instead of `import tensorflow.contrib.keras as keras
>>> np.random.seed(123)
>>> tf.set_random_seed(123)

To continue with the preparation of the training data, we need to convert the class labels (integers 0-9) into the one-hot format. Fortunately, Keras provides a convenient tool for this:

>>> y_train_onehot = keras.utils.to_categorical(y_train)
>>>
>>> print('First 3 labels: ', y_train[:3])
First 3 labels:  [5 0 4]
>>> print('
First 3 labels (one-hot):
', y_train_onehot[:3])
First 3 labels (one-hot):
 [[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
  [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
  [ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]]

Now, we can get to the interesting part and implement a neural network. Briefly, we will have three layers, where the first two layers each have 50 hidden units with the tanh activation function and the last layer has 10 layers for the 10 class labels and uses softmax to give the probability of each class. Keras makes these tasks very simple, as you can see in the following code implementation:

model = keras.models.Sequential()

model.add(
    keras.layers.Dense(
        units=50,
        input_dim=X_train_centered.shape[1],
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros',
        activation='tanh'))

model.add(
    keras.layers.Dense(
        units=50,
        input_dim=50,
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros',
        activation='tanh'))

model.add(
    keras.layers.Dense(
        units=y_train_onehot.shape[1],
        input_dim=50,
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros',
        activation='softmax'))


sgd_optimizer = keras.optimizers.SGD(
        lr=0.001, decay=1e-7, momentum=.9)

model.compile(optimizer=sgd_optimizer,
              loss='categorical_crossentropy')

First, we initialize a new model using the Sequential class to implement a feedforward neural network. Then, we can add as many layers to it as we like. However, since the first layer that we add is the input layer, we have to make sure that the input_dim attribute matches the number of features (columns) in the training set (784 features or pixels in the neural network implementation).

Also, we have to make sure that the number of output units (units) and input units (input_dim) of two consecutive layers match. In the preceding example, we added two hidden layers with 50 hidden units plus one bias unit each. The number of units in the output layer should be equal to the number of unique class labels—the number of columns in the one-hot-encoded class label array.

Note

Note that we used a new initialization algorithm for weight matrices by setting kernel_initializer= 'glorot_uniform'. Glorot initialization (also known as Xavier initialization) is a more robust way of initialization for deep neural networks (Understanding the difficulty of training deep feedforward neural networks, Xavier Glorot and Yoshua Bengio, in Artificial Intelligence and Statistics, volume 9, pages: 249-256. 2010). The biases are initialized to zero, which is more common, and in fact the default setting in Keras. We will discuss this weight initialization scheme in more detail in Chapter 14, Going Deeper - The Mechanics of TensorFlow.

Before we can compile our model, we also have to define an optimizer. In the preceding example, we chose a stochastic gradient descent optimization, which we are already familiar with from previous chapters. Furthermore, we can set values for the weight decay constant and momentum learning to adjust the learning rate at each epoch as discussed in Chapter 12, Implementing a Multilayer Artifiial Neural Network from Scratch. Lastly, we set the cost (or loss) function to categorical_crossentropy.

The binary cross-entropy is just a technical term for the cost function in the logistic regression, and the categorical cross-entropy is its generalization for multiclass predictions via softmax, which we will cover in the section Estimating class probabilities in multiclass classification via the softmax function later in this chapter.

After compiling the model, we can now train it by calling the fit method. Here, we are using mini-batch stochastic gradient with a batch size of 64 training samples per batch. We train the MLP over 50 epochs, and we can follow the optimization of the cost function during training by setting verbose=1.

The validation_split parameter is especially handy since it will reserve 10 percent of the training data (here, 6,000 samples) for validation after each epoch so that we can monitor whether the model is overfitting during training:

>>> history = model.fit(X_train_centered, y_train_onehot,
...                     batch_size=64, epochs=50,
...                     verbose=1,
...                     validation_split=0.1)

Train on 54000 samples, validate on 6000 samples
Epoch 1/50
54000/54000 [==============================] - 3s - loss: 0.7247 - val_loss: 0.3616
Epoch 2/50
54000/54000 [==============================] - 3s - loss: 0.3718 - val_loss: 0.2815
Epoch 3/50
54000/54000 [==============================] - 3s - loss: 0.3087 - val_loss: 0.2447

[…]
Epoch 50/50
54000/54000 [==============================] - 3s - loss: 0.0485 - val_loss: 0.1174

Printing the value of the cost function is extremely useful during training. This is because we can quickly spot whether the cost is decreasing during training and stop the algorithm earlier, if otherwise, to tune the hyperparameter values.

To predict the class labels, we can then use the predict_classes method to return the class labels directly as integers:

>>> y_train_pred = model.predict_classes(X_train_centered, verbose=0)
>>> print('First 3 predictions: ', y_train_pred[:3])
First 3 predictions:  [5 0 4]

Finally, let's print the model accuracy on training and test sets:

>>> y_train_pred = model.predict_classes(X_train_centered,
...                                      verbose=0)
>>> correct_preds = np.sum(y_train == y_train_pred, axis=0)
>>> train_acc = correct_preds / y_train.shape[0]
>>>
>>> print('First 3 predictions: ', y_train_pred[:3])
First 3 predictions:  [5 0 4]
>>>
>>> print('Training accuracy: %.2f%%' % (train_acc * 100))
Training accuracy: 98.88%
>>>
>>> y_test_pred = model.predict_classes(X_test_centered,
...                                     verbose=0)
>>> correct_preds = np.sum(y_test == y_test_pred, axis=0)
>>> test_acc = correct_preds / y_test.shape[0]
>>> print('Test accuracy: %.2f%%' % (test_acc * 100))
Test accuracy: 96.04%

Note that this is just a very simple neural network without optimized tuning parameters. If you are interested in playing more with Keras, feel free to further tweak the learning rate, momentum, weight decay, and number of hidden units.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset