Chapter 5. From Novice to Master Predictor: Maximizing Convolutional Neural Network Accuracy

In Chapter 1, we looked at the importance of responsible AI development. One of the aspects we discussed was the importance of robustness of our models. Users can trust what we build only if they can be assured that the AI they encounter on a day-to-day basis is accurate and reliable. Obviously, the context of the application matters a lot. It would be okay for a food classifier to misclassify pasta as bread on occasion. But it would be dangerous for a self-driving car to misclassify a pedestrian as a street lane. The main goal of this chapter is thus a rather important one—to build more accurate models.

In this chapter, you will develop an intuition for recognizing opportunities to improve your model’s accuracy the next time you begin training one. We first look at the tools that will ensure that you won’t be going in blind. After that, for a good chunk of this chapter, we take a very experimental approach by setting up a baseline, isolating individual parameters to tweak, and observing their effect on model performance and training speed. A lot of the code we use in this chapter is all aggregated in a single Jupyter Notebook, along with an actionable checklist with interactive examples. It is meant to be highly reusable should you choose to incorporate it in your next training script.

We explore several questions that tend to come up during model training:

  • I am unsure whether to use transfer learning or building from scratch to train my own network. What is the preferred approach for my scenario?

  • What is the least amount of data that I can supply to my training pipeline to get acceptable results?

  • I want to ensure that the model is learning the correct thing and not picking up spurious correlations. How can I get visibility into that?

  • How can I ensure that I (or someone else) will obtain the same results from my experiments every single time they are run? In other words, how do I ensure reproducibility of my experiments?

  • Does changing the aspect ratio of the input images have an impact on the predictions?

  • Does reducing input image size have a significant effect on prediction results?

  • If I use transfer learning, what percentage of layers should I fine tune to achieve my preferred balance of training time versus accuracy?

  • Alternatively, if I were to train from scratch, how many layers should I have in my model?

  • What is the appropriate “learning rate” to supply during model training?

  • There are too many things to remember. Is there a way to automate all of this work?

We will try to answer these questions one by one in the form of experiments on a few datasets. Ideally, you should be able to look at the results, read the takeaways, and gain some insight into the concept that the experiment was testing. If you’re feeling more adventurous, you can choose to perform the experiments yourself using the Jupyter Notebook.

Tools of the Trade

One of the main priorities of this chapter is to reduce the code and effort involved during experimentation while trying to gain insights into the process in order to reach high accuracy. An arsenal of tools exists that can assist us in making this journey more pleasant:

TensorFlow Datasets

Quick and easy access to around 100 datasets in a performant manner. All well-known datasets are available starting from the smallest MNIST (a few megabytes) to the largest MS COCO, ImageNet, and Open Images (several hundred gigabytes). Additionally, medical datasets like the Colorectal Histology and Diabetic Retinopathy are also available.

TensorBoard

Close to 20 easy-to-use methods to visualize many aspects of training, including visualizing the graph, tracking experiments, and inspecting the images, text, and audio data that pass through the network during training.

What-If Tool

Run experiments in parallel on separate models and tease out differences in them by comparing their performance on specific data points. Edit individual data points to see how that affects the model training.

tf-explain

Analyze decisions made by the network to identify bias and inaccuracies in the dataset. Additionally, use heatmaps to visualize what parts of the image the network activated on.

Keras Tuner

A library built for tf.keras that enables automatic tuning of hyperparameters in TensorFlow 2.0.

AutoKeras

Automates Neural Architecture Search (NAS) across different tasks like image, text, and audio classification and image detection.

AutoAugment

Utilizes reinforcement learning to improve the amount and diversity of data in an existing training dataset, thereby increasing accuracy.

Let’s now explore these tools in greater detail.

TensorFlow Datasets

TensorFlow Datasets is a collection of nearly 100 ready-to-use datasets that can quickly help build high-performance input data pipelines for training TensorFlow models. Instead of downloading and manipulating data sets manually and then figuring out how to read their labels, TensorFlow Datasets standardizes the data format so that it’s easy to swap one dataset with another, often with just a single line of code change. As you will see later on, doing things like breaking the dataset down into training, validation, and testing is also a matter of a single line of code. We will additionally be exploring TensorFlow Datasets from a performance point of view in the next chapter.

You can list all of the available datasets by using the following command (in the interest of conserving space, only a small subset of the full output is shown in this example):

import tensorflow_datasets as tfds
print(tfds.list_builders())
['amazon_us_reviews', 'bair_robot_pushing_small', 'bigearthnet', 'caltech101',
'cats_vs_dogs', 'celeb_a', 'imagenet2012', … , 'open_images_v4',
'oxford_flowers102', 'stanford_dogs','voc2007', 'wikipedia', 'wmt_translate',
'xnli']

Let’s see how simple it is to load a dataset. We will plug this into a full working pipeline later:

# Import necessary packages
import tensorflow_datasets as tfds

# Downloading and loading the dataset
dataset = tfds.load(name="cats_vs_dogs", split=tfds.Split.TRAIN)

# Building a performance data pipeline
dataset = dataset.map(preprocess).cache().repeat().shuffle(1024).batch(32).
prefetch(tf.data.experimental.AUTOTUNE)

model.fit(dataset, ...)
Tip

tfds generates a lot of progress bars, and they take up a lot of screen space—using tfds.disable_progress_bar() might be a good idea.

TensorBoard

TensorBoard is a one-stop-shop for all of your visualization needs, offering close to 20 tools to understand, inspect, and improve your model’s training.

Traditionally, to track experiment progress, we save the values of loss and accuracy per epoch and then, when done, plot it using matplotlib. The downside with that approach is that it’s not real time. Our usual options are to watch for the training progress in text. Additionally, after the training is done, we need to write additional code to make the graph in matplotlib. TensorBoard solves these and more pressing issues by offering a real-time dashboard (Figure 5-1) that helps us visualize all logs (such as train/validation accuracy and loss) to assist in understanding the progression of training. Another benefit it offers is the ability to compare our current experiment’s progress with the previous experiment, so we can see how a change in parameters affected our overall accuracy.

TensorBoard default view showcasing real-time training metrics (the lightly shaded lines represent the accuracy from the previous run)
Figure 5-1. TensorBoard default view showcasing real-time training metrics (the lightly shaded lines represent the accuracy from the previous run)

To enable TensorBoard to visualize our training and models, we need to log information about our training with the help of summary writer:

summary_writer = tf.summary.FileWriter('./logs')

To follow our training in real time, we need to load TensorBoard before the model training begins. We can load TensorBoard by using the following commands:

# Get TensorBoard to run
%load_ext tensorboard

# Start TensorBoard
%tensorboard --logdir ./log

As more TensorFlow components need a visual user interface, they reuse TensorBoard by becoming embeddable plug-ins within it. You’ll notice the Inactive drop-down menu on TensorBoard; that’s where you can see all the different profiles or tools that TensorFlow offers. Table 5-1 showcases a handful of the wide variety of tools available.

Table 5-1. Plugins for TensorBoard
TensorBoard plug-in name Description
Default Scalar Visualize scalar values such as classification accuracy.
Custom Scalar Visualize user-defined custom metrics. For example, different weights for different classes, which might not be a readily available metric.
Image View the output from each layer by clicking the Images tab.
Audio Visualize audio data.
Debugging tools Allows debugging visually and setting conditional breakpoints (e.g., tensor contains Nan or Infinity).
Graphs Shows the model architecture graphically.
Histograms Show the changes in the weight distribution in the layers of a model as the training progresses. This is especially useful for checking the effect of compressing a model with quantization.
Projector Visualize projections using t-SNE, PCA, and others.
Text Visualize text data.
PR curves Plot precision-recall curves.
Profile Benchmark speed of all operations and layers in a model.
Beholder Visualize the gradients and activations of a model in real time during training. It allows seeing them filter by filter, and allows them to be exported as images or even as a video.
What-If Tool For investigating the model by slicing and dicing the data and checking its performance. Especially helpful for discovering bias.
HParams Find out which params and at what values are the most important, allow logging of the entire parameter server (discussed in detail in this chapter).
Mesh Visualize 3D data (including point clouds).

It should be noted that TensorBoard is not TensorFlow specific, and can be used with other frameworks like PyTorch, scikit-learn, and more, depending on the plugin used. To make a plugin work, we need to write the specific metadata that we want to visualize. For example, TensorBoard embeds the TensorFlow Projector tool within to cluster images, text, or audio using t-SNE (which we examined in detail in Chapter 4). Apart from calling TensorBoard, we need to write the metadata like the feature embeddings of our image, so that TensorFlow Projector can use it to do clustering, as demonstrated in Figure 5-2.

TensorFlow Embedding Projector showcasing data in clusters, can be run as a TensorBoard plug-in
Figure 5-2. TensorFlow Embedding Projector showcasing data in clusters (can be run as a TensorBoard plugin)

What-If Tool

What if we could inspect our AI model’s predictions with the help of visualizations? What if we could find the best threshold for our model to maximize precision and recall? What if we could slice and dice the data along with the predictions our model made to see what it’s great at and where there are opportunities to improve? What if we could compare two models to figure out which is indeed better? What if we could do all this and more, with a few clicks in the browser? Sounds appealing for sure! The What-If Tool (Figure 5-3 and Figure 5-4) from Google’s People + AI Research (PAIR) initiative helps open up the black box of AI models to enable model and data explainability.

What-If Tool’s datapoint editor makes it possible to filter and visualize data according to annotations of the dataset and labels from the classifier
Figure 5-3. What-If Tool’s datapoint editor makes it possible to filter and visualize data according to annotations of the dataset and labels from the classifier
PR curves in the Performance and Fairness section of the What-If Tool help to interactively select the optimal threshold to maximize precision and recall
Figure 5-4. PR curves in the Performance and Fairness section of the What-If Tool help to interactively select the optimal threshold to maximize precision and recall

To use the What-If Tool, we need the dataset and a model. As we just saw, TensorFlow Datasets makes downloading and loading the data (in the tfrecord format) relatively easy. All we need to do is to locate the data file. Additionally, we want to save the model in the same directory:

# Save model for What If Tool
tf.saved_model.save(model, "/tmp/model/1/")

It’s best to perform the following lines of code in a local system rather than a Colab notebook because the integration between Colab and the What-If Tool is still evolving.

Let’s start TensorBoard:

$ mkdir tensorboard
$ tensorboard --logdir ./log --alsologtostderr

Now, in a new terminal, let’s make a directory for all of our What-If Tool experiments:

$ mkdir what-if-stuff

Move the trained model and TFRecord data here. The overall directory structure looks something like this:

$ tree .
├── colo
│   └── model
│         └── 1
│         ├── assets
│         ├── saved_model.pb
│         └── variables

We’ll serve the model using Docker within the newly created directory:

$ sudo docker run -p 8500:8500 
--mount type=bind,source=/home/{your_username}/what-if-stuff/colo/model/,
 target=/models/colo 
-e MODEL_NAME=colo -t tensorflow/serving

A word of caution: the port must be 8500 and all parameters must be spelled exactly as shown in the preceding example.

Next, at the far right, click the settings button (the gray gear icon) and add the values listed in Table 5-2.

Table 5-2. Configurations for the What-If Tool
Parameter Value
Inference address

ip_addr:8500

Model name

/models/colo

Model type Classification
Path to examples /home/{your_username}/what_if_stuff/colo/models/colo.tfrec (Note: this must be an absolute path)

We can now open the What-If Tool in the browser within TensorBoard, as depicted in Figure 5-5.

Setup window for the What-If Tool
Figure 5-5. Setup window for the What-If Tool

The What-If Tool can also be used to visualize datasets according to different bins, as shown in Figure 5-6. We can also use the tool to determine the better performing model out of multiple models on the same dataset using the set_compare_estimator_and_feature_spec function.

from witwidget.notebook.visualization import WitConfigBuilder

# features are the test examples that we want to load into the tool
models = [model2, model3, model4]
config_builder =
WitConfigBuilder(test_examples).set_estimator_and_feature_spec(model1, features)

for each_model in models:
    config_builder =
 config_builder.set_compare_estimator_and_feature_spec(each_model, features)
The What-If tool enables using multiple metrics, data visualization, and many more things under the sun
Figure 5-6. The What-If tool enables using multiple metrics, data visualization, and many more things under the sun

Now, we can load TensorBoard, and then, in the Visualize section, choose the model we want to compare, as shown in Figure 5-7. This tool has many features to explore!

Choose the model to compare using the What-If Tool
Figure 5-7. Choose the model to compare using the What-If Tool

tf-explain

Deep learning models have traditionally been black boxes, and up until now, we usually learn about their performance by watching the class probabilities and validation accuracies. To make these models more interpretable and explainable, heatmaps come to the rescue. By showing the area of an image that leads to the prediction with higher intensity, heatmaps can help visualize their learning. For example, an animal often seen in surroundings with snow might be getting high-accuracy predictions, but if the dataset has only that animal with snow as the background, the model might just be paying attention to the snow as the distinctive pattern instead of the animal. Such a dataset demonstrates bias, making the predictions not too robust when the classifier is put in the real world (and potentially dangerous!). Heatmaps can be especially useful to explore such bias, as often spurious correlations can seep in if the dataset is not carefully curated.

tf-explain (by Raphael Meudec) helps understand the results and inner workings of a neural network with the help of such visualizations, removing the veil on bias in datasets. We can add multiple types of callbacks while training or use its core API to generate TensorFlow events that can later be loaded into TensorBoard. For inference, all we need to do is pass an image, its ImageNet object ID along with a model into tf-explain’s functions. You must supply the object ID because tf.explain needs to know what is activated for that particular class. A few different visualization approaches are available with tf.explain:

Grad CAM

The Gradient-weighted Class Activation Mapping (Grad CAM) visualizes how parts of the image affect the neural network’s output by looking into the activation maps. A heatmap (illustrated in Figure 5-8) is generated based on the gradients of the object ID from the last convolutional layer. Grad CAM is largely a broad-spectrum heatmap generator given that it is robust to noise and can be used on an array of CNN models.

Occlusion Sensitivity

Occludes a part of the image (using a small square patch placed randomly) to establish how robust the network is. If the prediction is still correct, on average, the network is robust. The area in the image that is the warmest (i.e., red) has the most effect on the prediction when occluded.

Activations

Visualizes the activations for the convolutional layers.

Visualizations on images using MobileNet and tf-explain
Figure 5-8. Visualizations on images using MobileNet and tf-explain

As demonstrated in the code example that follows, such visualizations can be built with very little code. By taking a video, generating individual frames, and running tf-explain with Grad CAM and joining them together, we can build a detailed understanding of how these neural networks would react to moving camera angles.

from tf_explain.core.grad_cam import GradCAM
From tf.keras.applications.MobileNet import MobileNet

model = MobileNet(weights='imagenet', include_top=True)

# Set Grad CAM System
explainer = GradCAM()

# Image Processing
IMAGE_PATH = 'dog.jpg'
dog_index = 263
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
img = tf.keras.preprocessing.image.img_to_array(img)
data = ([img], None)

# Passing the image through Grad CAM
grid = explainer.explain(data, model, 'conv1', index)
name = IMAGE_PATH.split(".jpg")[0]
explainer.save(grid, '/tmp', name + '_grad_cam.png')

Common Techniques for Machine Learning Experimentation

The first few chapters focused on training the model. The following sections, however, contain a few more things to keep in the back of your mind while running your training experiments.

Data Inspection

Data inspection’s first biggest hurdle is determining the structure of the data. TensorFlow Datasets has made this step relatively easy because all of the available datasets are in the same format and structure and can be used in a performant way. All we need to do is load the dataset into the What-If Tool and use the various options already present to inspect the data. As an example, on the SMILE dataset, we can visualize the dataset according to its annotations, such as images of people wearing eyeglasses and those without eyeglasses, as illustrated in Figure 5-9. We observe that a wider distribution of the dataset has images of people wearing no eyeglasses, thus uncovering bias in the data due to an unbalanced dataset. This can be solved by modifying the weights of the metrics accordingly, through the tool.

Slicing and dividing the data based on predictions and real categories
Figure 5-9. Slicing and dividing the data based on predictions and real categories

Breaking the Data: Train, Validation, Test

Splitting a dataset into train, validation, and test is pretty important because we want to report the results on an unseen dataset by the classifier (i.e., the test dataset). TensorFlow Datasets makes it easy to download, load, and split the dataset into these three parts. Some datasets already come with three default splits. Alternatively, the data can be split by percentages. The following code showcases using a default split:

dataset_name = "cats_vs_dogs"
train, info_train = tfds.load(dataset_name, split=tfds.Split.TRAIN,
                    with_info=True)

The cats-and-dogs dataset in tfds has only the train split predefined. Similar to this, some datasets in TensorFlow Datasets do not have a validation split. For those datasets, we take a small percentage of samples from the predefined training set and treat it as the validation set. To top it all off, splitting the dataset using the weighted_splits takes care of randomizing and shuffling data between the splits:

# Load the dataset
dataset_name = "cats_vs_dogs"

# Dividing data into train (80), val (10) and test (10)
split_train, split_val, split_test = tfds.Split.TRAIN.subsplit(weighted=[80, 10,
                                     10])
train, info_train = tfds.load(dataset_name, split=split_train , with_info=True)
val, info_val = tfds.load(dataset_name, split=split_val, with_info=True)
test, info_test = tfds.load(dataset_name, split=split_test, with_info=True)

Early Stopping

Early stopping helps to avoid overtraining of the network by keeping a lookout for the number of epochs that show limited improvement. Assuming a model is set to train for 1,000 epochs and reaches 90% accuracy at the 10th epoch and stops improving any further for the next 10 epochs, it might be a waste of resources to train any further. If the number of epochs exceeds a predefined threshold called patience, training is stopped even if there might still be more epochs left to train. In other words, early stopping decides the point at which the training would no longer be useful and stops training. We can change the metric using the monitor parameter and add early stopping to our list of callbacks for the model:

# Define Early Stopping callback
earlystop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_acc',
					 min_delta=0.0001, patience=10)

# Add to the training model
model.fit_generator(... callbacks=[earlystop_callback])

Reproducible Experiments

Train a network once. Then, train it again, without changing any code or parameters. You might notice that the accuracies in two subsequent runs came out slightly different, even if no change was made in code. This is due to random variables. To make experiments reproducible across runs, we want to control this randomization. Initialization of weights of models, randomized shuffling of data, and so on all utilize randomization algorithms. We know that random number generators can be made reproducible by initializing a seed and that’s exactly what we will do. Various frameworks have their own ways of setting a random seed, some of which are shown here:

# Seed for Tensorflow
tf.random.set_seed(1234)

# Seed for Numpy
import numpy as np
np.random.seed(1234)
 
# Seed for Keras
seed = 1234
fit(train_data, augment=True, seed=seed)
flow_from_dataframe(train_dataframe, shuffle=True, seed=seed)
Note

It is necessary to set a seed in all the frameworks and subframeworks that are being used, as seeds are not transferable between frameworks.

End-to-End Deep Learning Example Pipeline

Let’s combine several tools and build a skeletal backbone, which will serve as our pipeline in which we will add and remove parameters, layers, functionality, and various other addons to really understand what is happening. Following the code on the book’s GitHub website (see http://PracticalDeepLearning.ai), you can interactively run this code for more than 100 datasets in your browser with Colab. Additionally, you can modify it for most classification tasks.

Basic Transfer Learning Pipeline

First, let’s build this end-to-end example for transfer learning.

# Import necessary packages
import tensorflow as tf
import tensorflow_datasets as tfds

# tfds makes a lot of progress bars, which takes up a lot of screen space, hence
# disabling them
tfds.disable_progress_bar()

tf.random.set_seed(1234)

# Variables
BATCH_SIZE = 32
NUM_EPOCHS= 20
IMG_H = IMG_W = 224
IMG_SIZE = 224
LOG_DIR = './log'
SHUFFLE_BUFFER_SIZE = 1024
IMG_CHANNELS = 3

dataset_name = "oxford_flowers102"

def preprocess(ds):
  x = tf.image.resize_with_pad(ds['image'], IMG_SIZE, IMG_SIZE)
  x = tf.cast(x, tf.float32)
  x = (x/127.5) - 1
  return x, ds['label']

def augmentation(image,label):
  image = tf.image.random_brightness(image, .1)
  image = tf.image.random_contrast(image, lower=0.0, upper=1.0)
  image = tf.image.random_flip_left_right(image)
  return image, label

def get_dataset(dataset_name):
  split_train, split_val = tfds.Split.TRAIN.subsplit(weighted=[9,1])
  train, info_train = tfds.load(dataset_name, split=split_train , with_info=True)
  val, info_val = tfds.load(dataset_name, split=split_val, with_info=True)
  NUM_CLASSES = info_train.features['label'].num_classes
  assert NUM_CLASSES >= info_val.features['label'].num_classes
  NUM_EXAMPLES = info_train.splits['train'].num_examples * 0.9
  IMG_H, IMG_W, IMG_CHANNELS = info_train.features['image'].shape
  train = train.map(preprocess).cache().
          repeat().shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
  train = train.map(augmentation)
  train = train.prefetch(tf.data.experimental.AUTOTUNE)
  val = val.map(preprocess).cache().repeat().batch(BATCH_SIZE)
  val = val.prefetch(tf.data.experimental.AUTOTUNE)
  return train, info_train, val, info_val, IMG_H, IMG_W, IMG_CHANNELS,
         NUM_CLASSES, NUM_EXAMPLES

train, info_train, val, info_val, IMG_H, IMG_W, IMG_CHANNELS, NUM_CLASSES,
NUM_EXAMPLES = get_dataset(dataset_name)

# Allow TensorBoard callbacks
tensorboard_callback = tf.keras.callbacks.TensorBoard(LOG_DIR,
                                                      histogram_freq=1,
                                                      write_graph=True,
                                                      write_grads=True,
                                                      batch_size=BATCH_SIZE,
                                                      write_images=True)

 
def transfer_learn(train, val, unfreeze_percentage, learning_rate):
   mobile_net = tf.keras.applications.ResNet50(input_shape=(IMG_SIZE, IMG_SIZE,
                IMG_CHANNELS), include_top=False)
   mobile_net.trainable=False
   # Unfreeze some of the layers according to the dataset being used
   num_layers = len(mobile_net.layers)
   for layer_index in range(int(num_layers - unfreeze_percentage*num_layers),
                             num_layers ):
   		mobile_net.layers[layer_index].trainable = True
   model_with_transfer_learning = tf.keras.Sequential([mobile_net,
                          tf.keras.layers.GlobalAveragePooling2D(),
                          tf.keras.layers.Flatten(),
                          tf.keras.layers.Dense(64),
                          tf.keras.layers.Dropout(0.3),
                          tf.keras.layers.Dense(NUM_CLASSES, 
                                                activation='softmax')],)
  model_with_transfer_learning.compile(
               optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                   loss='sparse_categorical_crossentropy',
                   metrics=["accuracy"])
  model_with_transfer_learning.summary()
  earlystop_callback = tf.keras.callbacks.EarlyStopping(
                                   monitor='val_accuracy', 
                                   min_delta=0.0001, 
                                   patience=5)
  model_with_transfer_learning.fit(train,
                                   epochs=NUM_EPOCHS,
                                   steps_per_epoch=int(NUM_EXAMPLES/BATCH_SIZE),
                                   validation_data=val,
                                   validation_steps=1,
                                   validation_freq=1,
                                   callbacks=[tensorboard_callback,
                                              earlystop_callback])
  return model_with_transfer_learning

# Start TensorBoard
%tensorboard --logdir ./log

# Select the last % layers to be trained while using the transfer learning
# technique. These layers are the closest to the output layers.
unfreeze_percentage = .33
learning_rate = 0.001

model = transfer_learn(train, val, unfreeze_percentage, learning_rate)

Basic Custom Network Pipeline

Apart from transfer learning on state-of-the-art models, we can also experiment and develop better intuitions by building our own custom network. Only the model needs to be swapped in the previously defined transfer learning code:

def create_model():
  model = tf.keras.Sequential([
     tf.keras.layers.Conv2D(32, (3, 3), activation='relu',
          input_shape=(IMG_SIZE, IMG_SIZE, IMG_CHANNELS)),
     tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
     tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
     tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
     tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
     tf.keras.layers.MaxPool2D(pool_size=(2, 2)),
     tf.keras.layers.Dropout(rate=0.3),
     tf.keras.layers.Flatten(),
     tf.keras.layers.Dense(128, activation='relu'),
     tf.keras.layers.Dropout(rate=0.3),
     tf.keras.layers.Dense(NUM_CLASSES, activation='softmax')
  ])
  return model 

def scratch(train, val, learning_rate):
  model = create_model()
  model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                 loss='sparse_categorical_crossentropy',
                 metrics=['accuracy'])

  earlystop_callback = tf.keras.callbacks.EarlyStopping(
                 monitor='val_accuracy', 
                 min_delta=0.0001, 
                 patience=5)

  model.fit(train,
           epochs=NUM_EPOCHS,
           steps_per_epoch=int(NUM_EXAMPLES/BATCH_SIZE),
           validation_data=val, 
           validation_steps=1,
           validation_freq=1,
           callbacks=[tensorboard_callback, earlystop_callback])
  return model

Now, it’s time to use our pipeline for various experiments.

How Hyperparameters Affect Accuracy

In this section, we aim to modify various parameters of a deep learning pipeline one at a time—from the number of layers fine-tuned, to the choice of the activation function used—and see its effect primarily on validation accuracy. Additionally, when relevant, we also observe its effect on the speed of training and time to reach the best accuracy (i.e., convergence).

Our experimentation setup is as follows:

  • To reduce experimentation time, we have used a faster architecture—MobileNet—in this chapter.

  • We reduced the input image resolution to 128 x 128 pixels to further speed up training. In general, we would recommend using a higher resolution (at least 224 x 224) for production systems.

  • Early stopping is applied to stop experiments if they don’t increase in accuracy for 10 consecutive epochs.

  • For training with transfer learning, we generally unfreeze the last 33% of the layers.

  • Learning rate is set to 0.001 with Adam optimizer.

  • We’re mostly using the Oxford Flowers 102 dataset for testing, unless otherwise stated. We chose this dataset because it is reasonably difficult to train on due to the large number of classes it contains (102) and the similarities between many of the classes that force networks to develop a fine-grained understanding of features in order to do well.

  • To make apples-to-apples comparisons, we take the maximum accuracy value in a particular experiment and normalize all other accuracy values within that experiment with respect to this maximum value.

Based on these and other experiments, we have compiled a checklist of actionable tips to implement in your next model training adventure. These are available on the book’s GitHub (see http://PracticalDeepLearning.ai) along with interactive visualizations. If you have more tips, feel free to tweet them @PracticalDLBook or submit a pull request.

Transfer Learning Versus Training from Scratch

Experimental setup

Train two models: one using transfer learning, and one from scratch on the same dataset.

Datasets used

Oxford Flowers 102, Colorectal Histology

Architectures used

Pretrained MobileNet, Custom model

Figure 5-10 shows the results.

Comparing transfer learning versus training a custom model on different datasets
Figure 5-10. Comparing transfer learning versus training a custom model on different datasets

Here are the key takeaways:

  • Transfer learning leads to a quicker rise in accuracy during training by reusing previously learned features.

  • Although it is expected that transfer learning (based on pretrained models on ImageNet) would work when the target dataset is also of natural imagery, the patterns learned in the early layers by a network work surprisingly well for datasets beyond ImageNet. That does not necessarily mean that it will yield the best results, but it can get close. When the images match more real-world images that the model was pretrained on, we get relatively quick improvement in accuracy.

Effect of Number of Layers Fine-Tuned in Transfer Learning

Experimental setup

Vary the percentage of trainable layers from 0 to 100%

Dataset used

Oxford Flowers 102

Architecture used

Pretrained MobileNet

Figure 5-11 shows the results.

Effect of % layers fine-tuned on model accuracy
Figure 5-11. Effect of % layers fine-tuned on model accuracy

Here are the key takeaways:

  • The higher the number of layers fine-tuned, the fewer epochs it took to reach convergence and the higher the accuracy.

  • The higher the number of layers fine-tuned, the more time it took per epoch for training, due to more computation and updates involved.

  • For a dataset that required fine-grained understanding of images, making more layers task specific by unfreezing them was the key to a better model.

Effect of Data Size on Transfer Learning

Experimental setup

Add one image per class at a time

Dataset used

Cats versus dogs

Architecture used

Pretrained MobileNet

Figure 5-12 shows the results.

Effect of the amount of data per category on model accuracy
Figure 5-12. Effect of the amount of data per category on model accuracy

Here are the key takeaways:

  • Even with only three images in each class, the model was able to predict with close to 90% accuracy. This shows how powerful transfer learning can be in reducing data requirements.

  • Because ImageNet has several cats and dogs, pretrained networks on ImageNet suited our dataset much more easily. More difficult datasets like Oxford Flowers 102 might require a much higher number of images to achieve similar accuracies.

Effect of Learning Rate

Experimental setup

Vary the learning rate between .1, .01, .001, and .0001

Dataset used

Oxford Flowers 102

Architecture used

Pretrained MobileNet

Figure 5-13 shows the results.

Effect of learning rate on model accuracy and speed of convergence
Figure 5-13. Effect of learning rate on model accuracy and speed of convergence

Here are the key takeaways:

  • Too high of a learning rate, and the model might never converge.

  • Too low a learning rate results in a long time taken to convergence.

  • Striking the right balance is crucial in training quickly.

Effect of Optimizers

Experimental setup

Experiment with available optimizers including AdaDelta, AdaGrad, Adam, Gradient Descent, Momentum, and RMSProp

Dataset used

Oxford Flowers 102

Architecture used

Pretrained MobileNet

Figure 5-14 shows the results.

Effect of different optimizers on the speed of convergence
Figure 5-14. Effect of different optimizers on the speed of convergence

Here are the key takeaways:

  • Adam is a great choice for faster convergence to high accuracy.

  • RMSProp is usually better for RNN tasks.

Effect of Batch Size

Experimental setup

Vary batch sizes in powers of two

Dataset used

Oxford Flowers 102

Architecture used

Pretrained

Figure 5-15 shows the results.

Effect of batch size on accuracy and speed of convergence
Figure 5-15. Effect of batch size on accuracy and speed of convergence

Here are the key takeaways:

  • The higher the batch size, the more the instability in results from epoch to epoch, with bigger rises and drops. But higher accuracy also leads to more efficient GPU utilization, so faster speed per epoch.

  • Too low a batch size slows the rise in accuracy.

  • 16/32/64 are good to start batch sizes with.

Effect of Resizing

Experimental setup

Change image size to 128x128, 224x224

Dataset used

Oxford Flowers 102

Architecture used

Pretrained

Figure 5-16 shows the results.

Effect of image size on accuracy
Figure 5-16. Effect of image size on accuracy

Here are the key takeaways:

  • Even with a third of the pixels, there wasn’t a significant difference in validation accuracies. On the one hand, this shows the robustness of CNNs. It might partly be because the Oxford Flowers 102 dataset has close-ups of flowers visible. For datasets in which the objects have much smaller portions in an image, the results might be lower.

Effect of Change in Aspect Ratio on Transfer Learning

Experimental Setup

Take images of various aspect ratios (width:height ratio) and resize them to a square (1:1 aspect ratio).

Dataset Used

Cats vs. Dogs

Architecture Used

Pretrained

Figure 5-17 shows the results.

Distribution of aspect ratio and corresponding accuracies in images
Figure 5-17. Distribution of aspect ratio and corresponding accuracies in images

Here are the key takeaways:

  • Most common aspect ratio is 4:3; that is, 1.33, whereas our neural networks are generally trained at 1:1 ratio.

  • Neural networks are relatively robust to minor modifications in aspect ratio brought upon by resizing to a square shape. Even up to 2.0 ratio gives decent results.

Tools to Automate Tuning for Maximum Accuracy

As we have seen since the rise of the nineteenth century, automation has always led to an increase in productivity. In this section, we investigate tools that can help us automate the search for the best model.

Keras Tuner

With so many potential combinations of hyperparameters to tune, coming up with the best model can be a tedious process. Often two or more parameters might have correlated effects on the overall speed of convergence as well as validation accuracy, so tuning one at a time might not lead to the best model. And if curiosity gets the best of us, we might want to experiment on all the hyperparameters together.

Keras Tuner comes in to automate this hyperparameter search. We define a search algorithm, the potential values that each parameter can take (e.g., discrete values or a range), our target object to maximize (e.g., validation accuracy), and sit back to see the program start training. Keras Tuner conducts multiple experiments changing the parameters on our behalf, storing the metadata of the best model. The following code example adapted from Keras Tuner documentation showcases searching through the different model architectures (varying in the number of layers between 2 and 10) as well as varying the learning rate (between 0.1 and 0.001):

from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

from kerastuner.engine.hypermodel import HyperModel
from kerastuner.engine.hyperparameters import HyperParameters

# Input data
(x, y), (val_x, val_y) = keras.datasets.mnist.load_data()
x = x.astype('float32') / 255.
val_x = val_x.astype('float32') / 255.

# Defining hyper parameters
hp = HyperParameters()
hp.Choice('learning_rate', [0.1, 0.001])
hp.Int('num_layers', 2, 10)

# Defining model with expandable number of layers
def build_model(hp):
    model = keras.Sequential()
    model.add(layers.Flatten(input_shape=(28, 28)))
    for _ in range(hp.get('num_layers')):
        model.add(layers.Dense(32, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))
    model.compile(
        optimizer=keras.optimizers.Adam(hp.get('learning_rate')),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])
    return model

hypermodel = RandomSearch(
             build_model,
             max_trials=20, # Number of combinations allowed
             hyperparameters=hp,
             allow_new_entries=False,
             objective='val_accuracy')

hypermodel.search(x=x,
             y=y,
             epochs=5,
             validation_data=(val_x, val_y))

# Show summary of overall best model
hypermodel.results_summary()

Each experiment will show values like this:

 > Hp values:
  |-learning_rate: 0.001
  |-num_layers: 6
┌──────────────┬────────────┬───────────────┐
│ Name         │ Best model │ Current model │
├──────────────┼────────────┼───────────────┤
│ accuracy     │ 0.9911     │ 0.9911        │
│ loss         │ 0.0292     │ 0.0292        │
│ val_loss     │ 0.227      │ 0.227         │
│ val_accuracy │ 0.9406     │ 0.9406        │
└──────────────┴────────────┴───────────────┘

On the experiment end, the result summary gives a snapshot of the experiments conducted so far, and saves more metadata.

Hypertuning complete - results in ./untitled_project
[Results summary]
 |-Results in ./untitled_project
 |-Ran 20 trials
 |-Ran 20 executions (1 per trial)
 |-Best val_accuracy: 0.9406

Another big benefit is the ability to track experiments online in real time and get notifications on their progress by visiting http://keras-tuner.appspot.com, getting an API key (from Google App Engine), and entering the following line in our Python program along with the real API key:

tuner.enable_cloud(api_key=api_key)

Due to the potentially large combinatorial space, random search is preferred to grid search as a more practical way to get to a good solution on a limited experimentation budget. But there are faster ways, including Hyperband (Lisha Li et al.), whose implementation is also available in Keras Tuner.

For computer-vision problems, Keras Tuner includes ready-to-use tunable applications like HyperResNet.

AutoAugment

Another example hyperparameter are augmentations. Which augmentations to use? How much magnitude to augment? Would combining one too many make matters worse? Instead of leaving these decisions to humans, we can let AI decide. AutoAugment utilizes reinforcement learning to come up with the best combination of augmentations (like translation, rotation, shearing) and the probabilities and magnitudes to apply, to maximize the validation accuracy. (The method was applied by Ekin D. Cubuk et al. to come up with the new state-of-the-art ImageNet validation numbers.) By learning the best combination of augmentation parameters on ImageNet, we can readily apply it to our problem.

Applying the prelearned augmentation strategy from ImageNet is pretty simple:

from PIL import Image
from autoaugment import ImageNetPolicy
img = Image.open("cat.jpg")
policy = ImageNetPolicy()
imgs = [policy(img) for _ in range(8) ]

Figure 5-18 displays the results.

Output of augmentation strategies learned by reinforcement learning on the ImageNet dataset
Figure 5-18. Output of augmentation strategies learned by reinforcement learning on the ImageNet dataset

AutoKeras

With AI automating more and more jobs, it is no surprise it can finally automate designing AI architectures, too. NAS approaches utilize reinforcement learning to join together mini-architectural blocks until they are able to maximize the objective function; in other words, our validation accuracy. The current state-of-the-art networks are all based on NAS, leaving human-designed architectures in the dust. Research in this area started showing promising results in 2017, with a bigger focus on making train faster in 2018. And now with AutoKeras (Haifeng Jin et al.), we can also apply this state-of-the-art technique on our particular datasets in a relatively accessible manner.

Generating new model architectures with AutoKeras is a matter of supplying our images and associated labels as well as a time limit by which to finish running the jobs. Internally, it implements several optimization algorithms, including a Bayesian optimization approach to search for an optimal architecture:

!pip3 install autokeras
!pip3 install graphviz
from keras.datasets import mnist
from autokeras.image.image_supervised import ImageClassifier

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.reshape(x_test.shape + (1,))

clf = ImageClassifier(path=".",verbose=True, augment=False)
clf.fit(x_train, y_train, time_limit= 30 * 60) # 30 minutes
clf.final_fit(x_train, y_train, x_test, y_test, retrain=True)
y = clf.evaluate(x_test, y_test)
print(y)

# Save the model as a pickle file
clf.export_autokeras_model("model.pkl")

visualize('.')

Post-training, we are all eager to learn how the new model architecture looks. Unlike most of the cleaner-looking images we generally get to see, this will look pretty obfuscated to understand or print out. But what we do find faith in is that it yields high accuracy.

Summary

In this chapter, we saw a range of tools and techniques to help investigate opportunities to improve our CNN accuracy. Building a case for iterative experimentation, you learned how tuning hyperparameters can bring about optimal performance. And with so many hyperparameters to choose from, we then looked at automated approaches, including AutoKeras, AutoAugment, and Keras Tuner. Best of all, the core code for this chapter combining multiple tools in a single Colab file is available online on the book’s GitHub (see http://PracticalDeepLearning.ai) and can easily be tuned to more than 100 datasets with a single line change and run online in the browser. Additionally, we compiled a checklist of actionable tips along with interactive experiments hosted online to help give your model a little extra edge. We hope that the material covered in this chapter will help make your models more robust, reduce bias, make them more explainable, and ultimately contribute to the responsible development of AI.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset