Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 9
Machine Learning Development Lifecycle

In the previous chapter, we deployed an AI model packaged with a web application that responded to HTTP requests. We saw how we developed, trained, and validated this model with sample data. Then this model was deployed inside a web application packaged together as a Docker container. This container was then deployed on Kubernetes as a microservice with the platform, providing infrastructure features like scaling, fail‐over, and load balancing. This approach is highly customized and needs tight coupling between the application code and the model. Software engineers need to know exactly how to call the model and need to manage the runtime for the model. A better approach is to deploy the model as an independent microservice and let the application call this microservice with agreed‐upon lightweight protocols. This way, the application has its own development lifecycle and the model has its own. This Machine Learning development lifecycle is gaining lot of popularity in the industry.

We will talk about the steps involved in the Machine Learning development lifecycle. We will talk about some best practices used by data scientists around different steps in working on a data science problem like data collection, cleansing, and structuring. We will explore policies for selecting the best modeling technique based on the type of data and the problem being solved. Finally, we will talk about deployment of the model in production both on the Cloud and on the edge. We will understand the hardware accelerators available that can make our model training and inference much faster on edge devices.

Machine Learning Model Lifecycle

After a Machine Learning project is conceptualized and the problem domain is understood, the model‐development process should kick off. Figure 9.1 shows the typical steps involved in the model development lifecycle. You may see different versions of this in other books and websites; however, the essence should be the same.

Block diagram depicting the typical steps involved in the Machine Learning development lifecycle. — **Figure 9.1**: Steps in a Machine Learning development lifecycle

Data scientists typically follow these steps while building an AI‐powered system. There are many time‐consuming and manual activities involved in this overall ML lifecycle. We need to empower our data scientists with tools that take care of most of these manual, repetitive, and time‐consuming parts of the process. These tools should help automate major portions of the entire flow of collecting data and building useful models—often referred to as the ML model pipeline.

In this chapter, we talk about each step of the ML lifecycle and introduce tools that can help make your life easier. The last step in this process—deployment to production—requires active collaboration between data scientists and software developers. We need tools that can automate not only the job of data scientists but also that of the developer. As you may have already figured, Kubernetes is one such tool that can help deploy software as a microservice, thus making it easier to manage and scale. Kubernetes takes care of many infrastructure concerns, like scalability, fail‐over, and load balancing. Using some special plug‐ins or extensions, Kubernetes can help you directly deploy ML models packaged as microservices. We will see examples of this using a special solution built on top of Kubernetes, called Kubeflow.

Modern software applications no longer only depend on fixed rules or logic programmed into code. We see more and more applications leveraging data‐driven models that learn patterns from data and make predictions. ML models are creating major breakthroughs and modern software development often includes a step to integrate ML models with existing code. Most times, these integrations tend to be highly custom and less reusable. They need very tight coordination between the data scientist and software developer.

Today, the effort is in building tooling that can help automate these steps, not different from how continuous integration (CI) and continuous delivery (CD) tools automated the software development lifecycle (SDLC). Specific to ML, we are seeing the emergence of Machine Learning or data science platforms that are geared toward making life easy for data scientists. Examples of these platforms are Amazon Web Services (AWS) SageMaker, Einstein platform from SalesForce, FBLearner flow from Facebook, Google AutoML, and Azure ML Studio. You may have heard some of these names in news articles or even played with some of these. They provide a highly user‐friendly web‐based environment where data scientists can connect to data sources, work on their data, and build and train ML models ready for deployment.

In the next chapter, we look at some of the best‐in‐class tools in each step in the ML lifecycle and building ML pipelines on Kubernetes. Before looking at these tools, first let's talk about each step in the ML model lifecycle.

Step 1: Define the Problem, Establish the Ground Truth

The first step, as in solving any engineering problem, is to clearly define the problem that you are trying to solve. Many times, we see projects that start with a set of data that is readily available and define a problem around that. You may get away with it and the data you have will give you relevant insights. However, it is highly recommended that you take a step back before jumping into collecting and processing data. Clearly define the problem you are trying to solve and what success means to you. If you start with the data‐first approach instead of the problem‐first approach, you tend to get biased by the data (just like a model gets biased, as in Chapter 2).

With AI and Machine Learning becoming so popular and easily accessible in the form of libraries and Python code, it's very easy to go with the data‐first approach. I see many folks get some easily available data and then try to apply AI to see what problems they solve. You may be lucky and find a good problem that has value in solving. But usually I recommend taking some time understanding your system and what problem areas exist that you can solve.

I recommend that you clearly understand the problem domain, meet with users and system experts, and ask as many questions as you can. Figure out what factors affect the problem that you are facing. Figure out what elements of the system you are studying you can measure. Determine what metrics exist and what new measurements need to be added. It may be recommended to consider this in terms of the dependent and independent variables we discussed in Chapter 2. Try to frame your problem in terms of dependent variables and find the independent variables that will affect these. Sometimes you may feel that existing data sources may not give you the full dependency of the problem you are solving. In that case, maybe you can recommend a new measurement in the system. However, for most systems, you will have to work with what data is available.

Also, once you build an AI system, you will need to measure it against something. It is highly recommended at the start to clearly define what the ground truth is. This is what you will measure your AI performance against.

For example, say you are building an AI system that looks at security camera video footage to monitor cars entering and leaving a parking garage. Your aim is to have a system that is as good as a human at detecting cars, maybe recording the license plate number and keeping a count of how many vehicles enter and leave the lot. Each of these actions is a problem statement on which you will build your specific ML solutions or models. Now how do you know if your system is as good or better than a human at solving these problems? For that, you need the ground truth as a reference.

You could take historical video footage of cars from the same lot and have a human sit and manually annotate when a car appears on‐screen, record the license plate, and keep a count of cars moving in and out. As you can see, this is a pretty laborious activity. It is highly recommended to clearly establish the ground truth you will use as a reference for your AI problem and plan to collect information about it.

Step 2: Collect, Cleanse, and Prepare the Data

If you spend enough effort on the previous step and define the problem and establish the ground truth, you will have a pretty good idea what data sources are available in your system. These could be sensors, flat files, databases, historians, cameras, websites, etc. Your data will be used to train the model, so a Garbage‐In‐Garbage‐Out (GIGO) principle is very much applicable. If you give it bad data, you'll have a bad model that does not generalize well on real field data.

Many times, you may feel that the current data sources will not give you a good estimation of the problem you are trying to solve. As in the earlier example of cars entering and leaving a garage, if your cameras don't face the entry and exit gates, you will not have good video that you can use to analyze and track the cars. In this case, before doing much analysis, you may need to propose the right mounting locations and angles for cameras.

Once you have the right data being collected, it is important to gauge the noise in the data and cleanse it. A typical step is to collect a sample from your data source and apply descriptive statistics to it. You may look at statistical summaries or charts in Excel or tools like MATLAB and R. If your data is unstructured like images and video, you may spend time manually checking for noise in data. Noisy data will have a major negative impact on the performance of your AI model.

Data cleansing is a very important step for getting your field data in a clean state that can be used for training your AI model. Cleansing is the removal or replacement of bad or missing data from your dataset. Bad or missing data may be due to failure in the sensing equipment in case of monitoring sensors, loss of communication when data has to be sent over a network to your analytic, human error when entering data in a database, and many more. Cleansing may involve either deletion of the bad/missing records or imputation (replacement) of those data points with new values. The third option is to raise a fault and not process the data when it is bad. This is usually done for mission‐critical systems. This can be done with basic tools like Excel, may involve sophisticated programming in MATLAB or Python, or may even be done with dedicated cleansing tools. The level of sophistication you need in the cleansing method will depend on the impact the noisy data has on your results.

Let's say you are collecting room temperature values from a thermostat. Your data is structured as a series of values over time (a timeseries), with each data point representing an event in time. Now, say for certain times, you get noisy or bad data—like temperature readings of –9999 or 9999 or NULL. Depending on your data collection system, these values will indicate bad data due to sensing equipment failure. Now, you can filter these data points out and ignore them. So essentially you are not letting your model consider these events due to failure to get good data. This option of deletion is usually employed when you have lots of data points and specific points don't matter. The caveat here is that during these ignored data points, the system may be undergoing some significant change that will not be captured by the system.

Another option is to impute the missing data points. This is usually better when you have missing data for continuous periods. For example, say you are recording temperature from a thermostat and it gives bad data for two hours due to a dead battery. You may fill that data with the average room temperature before and after that event. Or you may fill those data points with the average room temperature for that day. Depending on your problem domain, you may choose the strategy to impute missing data.

If your problem is highly critical and the missing data may cause major issues, you might flag that as a fault in the system rather than attempt to do any prediction with bad data. For example, if you are measuring the heartbeat of a patient and you get bad data, it is highly recommended to flag a fault rather than try to interpolate.

Once you start collecting data and have a data cleansing strategy in place, the next step is to prepare the data for consumption by your model. This involves feature engineering and separating the data into training and validation sets. Feature engineering is extracting relevant features from the raw data so that these features can be used for building your model. If you have structured data like a timeseries, feature engineering involves trying to identify features of interest and possibly eliminating redundant and duplicate data. For unstructured data, feature engineering may involve many specialized techniques depending on the datatype. For example, for image data you may want to extract only the relevant features (pixel values) by converting images to grayscale, resizing, cropping, etc. These methods will reduce the size of your images and only keep relevant data that will help in your prediction model.

I see many Machine Learning projects with limited data tend to use all of it for training. Then they don't have any way to validate if their model has overfit on the training data. You need to make sure that you collect data for both training and validation and keep them separate.

We may use techniques like data augmentation to increase the volume of our data. We saw this example in our logo image‐classification problem. It is usually recommended to use augmentation techniques or ways to generate non‐natural data for training sets. You would be better off keeping your validation dataset as close to the real data as possible.

One way to think about this is if you were a teacher. During your normal school curriculum, you will train students on different topics. But there will be some challenging problems you will want to keep for the examination to really test if the students have learned the topic. These questions would be something out of the book so you can verify if your class actually learned the topic. In the same way, you want to keep your verification data quite challenging so that if you get a good precision score on this data you know you have a good model at hand.

Many times, the data available in the field or data stores may not be in the format you desire to train the models. You may have to do some format conversions to get the data in the format you want to do the training. For example, video data is often stored in a highly compressed H.264 format. However, for use in a computer vision or Deep Learning application, this will need to be decoded using the H.264 codec and converted to the three‐dimensional pixel array for analysis. The data format is something that needs to be considered in the model development cycle.

Step 3: Build and Train the Model

Now that you have your problem defined, data sources identified, data cleansed, relevant features isolated, and your dataset separated into training and validation, we get to the fun part of building the model and training it. It is important to give considerable thought to these steps before jumping into model building in order to save on rework.

We saw in Chapters 2 and 4 different ML and DL modeling techniques. Figure 9.2 shows a high‐level strategy you can follow for selecting your model. You, as a data scientist, may (and should) find your own methods for planning this strategy, but you can use this figure as a reference.

Flowchart presenting a high-level strategy of an unofficial generic guideline for selecting a model for Machine Learning. — **Figure 9.2**: An unofficial generic guideline for model selection

The first step is to understand the type of data—structured or unstructured. With structured data, every feature or column has a significance related to our problem. This type of data will usually be in tabular format like database tables, or in timeseries format like sensor readings. Unstructured data may be images, text, audio, or video—it is represented in a computer's memory as arrays or sequence of arrays. Here each column of data does not have significance—it is usually pixel intensity values for images or word embeddings for text. These numbers only gain significance when they are seen as a whole in the image or text sequence.

For both structured and unstructured data, you can do some feature engineering. Here we try to remove features that are not significant or run some computer vision or natural language processing methods to extract valuable features. For example, in the earlier example of monitoring cars coming in and out of a parking garage, we could crop a large image into a smaller window that only shows the parking gate where a car is likely to be present. The rest of the image data is not relevant and can be eliminated. Feature engineering is particularly important with structured data.

After feature engineering, you can apply the supervised or unsupervised Machine Learning techniques we discussed in Chapter 2. Supervised is where you have labeled data to guide your training and unsupervised is where you are trying to find patterns without any knowledge of existing labels.

Now you can technically skip feature engineering and use the Deep Learning techniques we talked about in Chapter 4. Deep Learning can help us build end‐to‐end models that can take data in raw formats and automatically extract features of importance. This is of particular importance with unsupervised data. You can pass raw data in the form of images or text to Deep Learning models and, through the many layers, the model extracts important features. Starting with the lowest level of features like pixel values, at each layer you try to extract high‐level features. This way you map a complex three‐dimensional array of pixels to an array of 10 numbers indicating 10 classes the image may belong to.

Depending on the type of data you are processing, there are certain neural network architectures that have been standardized. For image analysis, convolutional networks are pretty much universally accepted as the chosen architecture. For a sequence of data like text or audio, the standard in the industry is the recurrent neural network (RNN)—particularly of type long short‐term memory (LSTM). For converting one sequence to another, such as text from one language to another or text to speech, we have a newer architecture called sequence‐to‐sequence models. You may look at a popular architecture for a neural network that has been used by others to solve similar problems. For example, a particular type of Convolutional Neural Networks (CNN) architecture called VGG‐16 is very popular for image recognition. If you have a similar problem, you can build your model with that particular architecture and train it on your data. Another option is to take an existing model with weights and use transfer learning to train your data. We saw examples of this in Chapter 4.

To actually build the model, you may use the common programmatic approach. Here you build the model using your preferred data science language like Python, R, or MATLAB and then store the model in a binary format for deployment. More recently many AI workbenches have come into the limelight that allow data scientists to build models by writing minimal or no code. We saw Google Colaboratory, which helps us run Python code without installing any software and on Cloud CPU and GPUs. With AI workbenches like H2O and DataRobot, even the model development can be automated. H2O.ai provides a web interface, as shown in Figure 9.3, which allows for uploading data from CSV files and databases and helps us build Machine Learning models through configuration alone.

Screenshot of the H2O Artificial Intelligence workbench that allows codeless model development. — **Figure 9.3**: The H2O AI workbench allows codeless model development

Step 4: Validate the Model, Tune the Hyper‐Parameters

After you build a model, it needs to be trained and validated against your datasets. It is very rare that you would get good precision numbers on training and validation datasets on the first attempt. You will most likely have to tune many knobs to improve these numbers. After the obvious initial decisions are made, like what ML technique to use or what deep architecture to adopt, most of the data science effort goes in tuning these hyper‐parameters. By changing values of hyper‐parameters like the number of layers, the neurons in a layer, the learning rate, the activation function types, etc., you can understand how to improve the precision of your model. Although most of these decisions will depend on your domain and the dataset, there are certain rules of thumb that expert data scientists use after years of practice. AI workbenches like H2O try to capture these best practices and help users modify the values accordingly.

More recently, a new technique is becoming very popular for tuning model hyper‐parameters—it is called AutoML. AutoML is still evolving but it essentially provides an automated way of building and training your models. The idea is that to a given dataset under study, many different shallow and Deep Learning models are applied simultaneously. Each is applied with many hyper‐parameters often decided by best practices followed by data scientists. Using these combinations in parallel, the best combination of model and hyper‐parameters is identified for that particular problem.

Google has been aggressively marketing AutoML as its technique where neural networks build new neural networks. The H2O workbench we saw earlier also has support for AutoML. When we run AutoML in H2O for a given problem—with training and validation data—it tries several model and parameter combinations in parallel. It then shows a leaderboard with results showing the top models and their rankings, as shown in Figure 9.4.

Screenshot of the H2O Artificial Intelligence example of an AutoML leaderboard, with the models sorted in the order of median_residual_deviance. — **Figure 9.4**: H2O AI example of an AutoML leaderboard

Step 5: Deploy to Production

After your model is trained and validated with acceptable precision numbers, you can deploy it to production. As we saw in the previous chapter, this could be done as a web application with data being fed to the model collected from a user interface. The thing to keep in mind here is that any preprocessing done to the data during training should also be done now during inference. For example, for image data we divide by 255, so that we can normalize the values between 0 and 1. The same thing has to be done in the web application before feeding to the model. The result from the model must then be evaluated.

Some environments like MATLAB and R have a way for the model to be packaged as an executable and deployed on a system. More recently, Cloud‐based model deployment is getting a lot of attention. One example is the Amazon Web Services SageMaker. AWS SageMaker gives developers a Jupyter Notebook to build their model. Data can be pulled from the web or from AWS S3 (Simple Storage Service), which can store any type of file. After training and validation using code, the model can be automatically deployed in the Cloud and scaled to run on multiple machines.

In our earlier example, we packaged the model as a microservice in a Docker container and deployed it on a Kubernetes cluster. The scaling, fail‐over, and load balancing is taken care of by Kubernetes. However, you have to write the application code to wrap the model file. Also, the inputs entered by the users must be formatted and fed to the model, which is invoked from the code. There is an open source solution developed by Google called TensorFlow‐Serving that allows for automated packaging of your model files into microservices and deployment. This can now be called with a REST API using HTTP calls. TensorFlow‐Serving also supports Google's high‐performance Remote Procedure Calls (RPC) protocol called gRPC. We will talk about this more in the next chapter.

Feedback and Model Updates

Keep in mind that deploying the model in production is not the end of the story. A constant feedback mechanism needs to be in place to see how well the model is performing with real data. Many times, the model fails to get accurate numbers with real field data, due to several reasons. The model may need to be recalibrated and fine‐tuned with new data and redeployed. The part of the ML lifecycle from building the model to deployment in production may involve several iterations. This iterative nature should be accounted for in the ML platform and we should have automated tools that can monitor performance, rebuild the model, retrain it, and deploy a new version to production.

It is also possible that the performance of your model will degrade over time. This could be due to changes in the environment, incorrect calibration, etc. Or it could be that the data you collected for training and validating your model is no longer valid. The system has changed and it needs retraining. Retraining is something that you should carefully consider as part of your software process. You will not be able to release a single universal model that will solve your problem forever. After a few times you will need to modify and retrain the model on new data and deploy it again. Your development process should incorporate this change management step. This way you have a defined process to collect new data, validate your model, and retrain and deploy a newer version.

Kubernetes can greatly help you in your model retraining and redeployment process. New workflow tools like Kubeflow are evolving that can help you build ML pipelines that include provisions to test models on new data, build new models, and deploy them to production. These systems integrate with existing continuous integration tools to make deployment very straightforward. We will discuss these newer tools in the next chapter.

Deployment on Edge Devices

So far, we have talked about deployment in the Cloud or on‐premise servers using platforms like Kubernetes. However, many times you need to analyze data close to the source and provide results to take immediate action. Deployment at the edge on specialized hardware has its own constraints. The models are packaged as binary files and are usually invoked by embedded code written in C or C++. Another way of deploying an AI model is packaging it as a mobile app and deploying on a relatively low‐powered (as compared to Cloud servers) mobile device.

These mobile and edge devices are usually limited in processing power and memory. Hence, the models need to be extremely efficient and lightweight to run on these devices. Also, these devices often use hardware acceleration to make the models run faster. These models typically are meant for real‐time alerting of specific activities happening in field. For example, if you want to control the gate of the parking garage using a camera that sees cars entering, this will need a model that detects cars running on an edge device and makes a real‐time call to the circuitry that opens the gate when a car approaches.

Modern edge devices are supported by hardware acceleration chips to support Deep Learning models. The most popular chip among these is NVIDIA GPU—Graphics Processing Unit. GPUs started off as specialized chips to render complex graphics on‐screen very quickly. The graphics cards that are used for laptops and game consoles have embedded GPU chips. These chips could support massively parallel linear algebra calculations. They have thousands of processing cores that can do these operations in parallel and render an image on‐screen.

It turns out that for advanced Deep Learning also we need massive parallel linear algebra calculations to be done. NVIDIA started extending its graphics cards for computing and they became very popular. Now NVIDIA makes dedicated GPU cards for Deep Learning. It also develops high‐end systems like the DGX‐1, which has multiple such GPU cards functioning as a unit and can solve complex Deep Learning problems very quickly. The idea behind GPUs is pretty straightforward. A CPU chip is a general‐purpose chip that can do complex types of operations very quickly, but sequentially. Using a multi‐core CPU, we could get parallelism but it would be pretty limited. GPUs extend these basic cores to thousands of cores. Thus, we get the true benefit of running calculations in parallel.

More recently (as of 2018), other companies started getting into this Deep Learning chipset space. Google launched a Tensor Processing Unit (TPU), which runs on the same principle as GPU but claims to consume less power. Microsoft is investing in a technology called FPGA (field‐programmable gate array), which allows for programmatic development of processors. Microsoft claims using FPGAs gives them bigger benefits of parallel computing similar to GPUs.

This technology is evolving continuously. Though NVIDIA is the market leader with GPUs, the competition is catching up. I believe in couple of years we will be able to say for sure that a particular technology is the leader and a particular kind of chip is best for deploying Deep Learning models at the edge.

To actually show how GPU and TPU improve your Deep Learning model training times compared to a CPU, let's run the same code on different systems and analyze the performance. The easiest way to do this is to build a Jupyter Notebook in Google Colaboratory. This lets us switch the runtimes among a dual‐core CPU, NVIDIA K80 GPU, and a Google TPU. This way, we can test our code separately on these three environments. Let's first see the code in Listing 9.1.

First we will determine what device is connected to our Google Colab instance—a GPU, TPU, or only a CPU. Keep in mind that both GPU and TPU are supplementary chips; the machine will still need a CPU to run the main OS.

Listing 9.1: Code to Check if GPU, TPU Is Attached

     # import the necessary libraries
import tensorflow as tf
import os
 
# Check if GPU exists
gpu_exists = (tf.test.gpu_device_name() != '')
# Check if TPU exists
tpu_exists = (os.getenv('COLAB_TPU_ADDR') is not None)
 
# if GPU device is attached
if gpu_exists:
    print('GPU device found: ', tf.test.gpu_device_name())
# if TPU device is attached
elif tpu_exists:
    print('TPU device found: ', os.getenv('COLAB_TPU_ADDR'))
else:
    print('No GPU or TPU. We have to reply on good old CPU!')
 
print ('----------------')
print ()
print ()
print('---------- CPU configuration --- START ----------')
command = 'cat /proc/cpuinfo'
print (os.popen(command).read().strip())
print('---------- CPU configuration --- END ----------')
 
print('---------- Memory configuration --- START ----------')
command = 'cat /proc/meminfo'
print (os.popen(command).read().strip())
print('---------- Memory configuration --- END ----------')

You can create a new Google Colaboratory Notebook and enter this code in the cellblock. Then, one by one, select among the three runtime options provided. After selecting each runtime, click Connect to commission a cloud virtual machine and when this machine is ready, run this code. You will know the configuration of the machine and be able to distinguish between GPU and TPU. This code is particular to Google Colaboratory, but can easily be modified for specific edge hardware you have.

Under the Runtime menu, you can select the Change Runtime Type option and then select between GPU, TPU, or None (see Figure 9.5). None means only the CPU will be available—no hardware accelerator. Then you can connect to that runtime and run this code block on different instances to see what hardware accelerator you have.

Screenshot displaying the Notebook Settings after changing from CPU to GPU runtime in Google Colaboratory. — **Figure 9.5**: Changing from CPU to GPU runtime in Google Colaboratory

We will train a Convolutional Neural Network on the standard CIFAR dataset that comes with Keras. Then we'll change the runtime and see how the training time varies. We see that to run on GPU and pure CPU, the same code works on both. For Google's TPU, some modifications are needed. However, I feel as the TPU technology evolves, it will be able to run the same code on TPU. Ideally your hardware acceleration chip should not affect your code. The same code should be able to run on multiple environments as long as you have the right drivers configured—for GPU or TPU. After all, that's the power a platform like TensorFlow and Keras bring. See Listing 9.2.

Listing 9.2: Code to Load the Dataset, Display Some Sample Images, and Create the Model

     # import libraries
from tensorflow import keras
import numpy as np
# configure plotting
import matplotlib.pyplot as plt
%matplotlib inline
 
# import the dataset
dataset = keras.datasets.cifar10
 
# collect training and testing data
(train_images, train_labels), (test_images, test_labels) = dataset
.load_data()
 
# define the class names for CIFAR 10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck']
 
# plot some sample images
plt.figure(figsize=(8,8))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i])
    plt.xlabel(class_names[train_labels[i][0]])
 
# preprocess the training and testing data
x_train, x_test = train_images / 255.0, test_images / 255.0
y_train, y_test = train_labels, test_labels
 
# build the Convolutional Neural Network Model
model = tf.keras.models.Sequential([
            tf.keras.layers.Conv2D(32, (3, 3), padding='same', input_
shape=x_train.shape[1:]),
            tf.keras.layers.Activation('relu'),
            tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
            tf.keras.layers.Dropout(0.25),
 
            tf.keras.layers.Conv2D(64, (3, 3), padding='same'),
            tf.keras.layers.Activation('relu'),
            tf.keras.layers.Conv2D(64, (3, 3)),
            tf.keras.layers.Activation('relu'),
            tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
            tf.keras.layers.Dropout(0.25),
 
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(512, activation=tf.nn.relu),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax)
            ])
 
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
 
model.summary()

Figure 9.6 shows the sample images.

Illustration of sample images of truck models, animals, birds, horses, etc., from the CIFAR-10 dataset. — **Figure 9.6**: Sample images from the CIFAR‐10 dataset

Now that we have loaded the data and defined the model, we will train the model on our dataset. The model development portion of code for the GPU and TPU environments is the same. The model execution code for TPU is slightly different, so we use the tpu_exists flag, which tells us if TPU is attached. See Listing 9.3.

Listing 9.3: Check if TPU Is Attached and Run Code to Train Model—Capture Times

import datetime
 
# capture start time
st_time = datetime.datetime.now()
 
# we will train for 10 epochs
num_epochs = 10
 
# if not TPU then run simple train command
if not tpu_exists:
    model.fit(x_train, y_train, epochs=num_epochs)
 
# for TPU we have to use custom data structures
else:
        tpu_url = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
      model, strategy=tf.contrib.tpu.TPUDistributionStrategy(
             tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_url)
          )
      )
    tpu_model.compile(
          optimizer=tf.train.AdamOptimizer(learning_rate=1e-3, ),
          loss=tf.keras.losses.sparse_categorical_crossentropy,
          metrics=['sparse_categorical_accuracy']
      )
 
    # define a training function
    def train_gen(batch_size):
        while True:
            offset = np.random.randint(0, x_train.shape[0] - batch_size)
        yield x_train[offset:offset+batch_size], y_train[offset:offset +
batch_size]
 
    # fit the model on TPU
    tpu_model.fit_generator(
        train_gen(1024),
        epochs=num_epochs,
        steps_per_epoch=100,
        validation_data=(x_test, y_test),
        )
 
# record time after training
end_time = datetime.datetime.now()
 
print('Training time = %s'%(end_time-st_time))

Now let's change the runtimes from GPU to TPU to CPU and record the training times. We will see how the hardware acceleration helps in training. Training is usually the more time consuming of the Machine Learning tasks. You will most likely see the same performance improvement in inference times. See Listing 9.4 and Figure 9.7.

Listing 9.4: Result of Running Training on GPU, TPU, and CPU

GPU device found: /device:GPU:0
 
----------------
Epoch 1/10
50000/50000 [==============================] - 22s 430us/step - loss:
1.4497 - acc: 0.4754
Epoch 2/10
50000/50000 [==============================] - 19s 372us/step - loss:
1.0527 - acc: 0.6242
Epoch 3/10
50000/50000 [==============================] - 19s 386us/step - loss:
0.9037 - acc: 0.6807
Epoch 4/10
50000/50000 [==============================] - 19s 370us/step - loss:
0.8085 - acc: 0.7163
Epoch 5/10
50000/50000 [==============================] - 19s 376us/step - loss:
0.7259 - acc: 0.7443
Epoch 6/10
50000/50000 [==============================] - 18s 370us/step - loss:
0.6556 - acc: 0.7687
Epoch 7/10
50000/50000 [==============================] - 19s 375us/step - loss:
0.6067 - acc: 0.7864
Epoch 8/10
50000/50000 [==============================] - 19s 373us/step - loss:
0.5561 - acc: 0.8038
Epoch 9/10
50000/50000 [==============================] - 19s 375us/step - loss:
0.5156 - acc: 0.8187
Epoch 10/10
50000/50000 [==============================] - 19s 379us/step - loss:
0.4776 - acc: 0.8319
----------------
 
Training time = 0:03:11.096094
 
TPU device found: 10.12.160.114:8470
 
----------------
Epoch 1/10
100/100 [==============================] - 24s 243ms/step - loss: 1.6977
- sparse_categorical_accuracy: 0.3873 - val_loss: 1.4215 - val_sparse_
categorical_accuracy: 0.4956
Epoch 2/10
100/100 [==============================] - 16s 162ms/step - loss: 1.3143
- sparse_categorical_accuracy: 0.5318 - val_loss: 1.1858 - val_sparse_
categorical_accuracy: 0.5812
Epoch 3/10
100/100 [==============================] - 15s 151ms/step - loss: 1.1498
- sparse_categorical_accuracy: 0.5938 - val_loss: 1.0693 - val_sparse_
categorical_accuracy: 0.6247
Epoch 4/10
100/100 [==============================] - 16s 157ms/step - loss: 1.0443
- sparse_categorical_accuracy: 0.6324 - val_loss: 0.9734 - val_sparse_
categorical_accuracy: 0.6594
Epoch 5/10
100/100 [==============================] - 15s 152ms/step - loss: 0.9380
- sparse_categorical_accuracy: 0.6722 - val_loss: 0.9119 - val_sparse_
categorical_accuracy: 0.6779
Epoch 6/10
100/100 [==============================] - 14s 144ms/step - loss: 0.8462
- sparse_categorical_accuracy: 0.7031 - val_loss: 0.8745 - val_sparse_
categorical_accuracy: 0.6959
Epoch 7/10
100/100 [==============================] - 15s 148ms/step - loss: 0.7809
- sparse_categorical_accuracy: 0.7281 - val_loss: 0.8322 - val_sparse_
categorical_accuracy: 0.7050
Epoch 8/10
100/100 [==============================] - 15s 147ms/step - loss: 0.7181
- sparse_categorical_accuracy: 0.7507 - val_loss: 0.8213 - val_sparse_
categorical_accuracy: 0.7170
Epoch 9/10
100/100 [==============================] - 15s 148ms/step - loss: 0.6556
- sparse_categorical_accuracy: 0.7708 - val_loss: 0.7956 - val_sparse_
categorical_accuracy: 0.7236
Epoch 10/10
100/100 [==============================] - 14s 145ms/step - loss: 0.5934
- sparse_categorical_accuracy: 0.7922 - val_loss: 0.7902 - val_sparse_
categorical_accuracy: 0.7333
----------------
 
Training time = 0:02:58.394083
 
No GPU or TPU. We have to reply on good old CPU!
 
----------------
Epoch 1/10
50000/50000 [==============================] - 206s 4ms/step - loss:
1.4893 - acc: 0.4583
Epoch 2/10
50000/50000 [==============================] - 203s 4ms/step - loss:
1.1087 - acc: 0.6055
Epoch 3/10
50000/50000 [==============================] - 204s 4ms/step - loss:
0.9576 - acc: 0.6615
Epoch 4/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.8492 - acc: 0.7010
Epoch 5/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.7750 - acc: 0.7285
Epoch 6/10
50000/50000 [==============================] - 202s 4ms/step - loss:
0.7060 - acc: 0.7523
Epoch 7/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.6430 - acc: 0.7733
Epoch 8/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.5984 - acc: 0.7884
Epoch 9/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.5564 - acc: 0.8027
Epoch 10/10
50000/50000 [==============================] - 203s 4ms/step - loss:
0.5184 - acc: 0.8172
----------------
 
 
Training time = 0:33:54.456107

Screenshot of the Notebook Settings dialog box to change the setting to use GPU. — **Figure 9.7**: Change the setting to use GPU

Now we will change the settings in colab to include a TPU and back to only CPU. Figures 9.8 and 9.9 show these settings.

Screenshot of the Notebook Settings dialog box to change the setting to use TPU. — **Figure 9.8**: Change the setting to use TPU

Screenshot of the Notebook Settings dialog box to change the setting to use CPU only. — **Figure 9.9**: Change the setting to use CPU only

We see that GPU (time 3:11) and TPU (time 2:58) give significantly better performance for training the Deep Learning model as compared to CPU (time 33:54). We get a 10x improvement—that is, a 10X reduction in training time using a GPU or TPU. These 30 minutes for basic model training are very valuable. Especially when a data scientist has to try different scenarios and train hundreds of models, saving 30 minutes per model is extremely valuable. Hence, the GPU hardware is pretty expensive. However, if your team is involved in training many models with different configurations, you will definitely get a good return on your investment.

Between GPU and TPU, it's not really an apples‐to‐apples comparison, since the technology is evolving rapidly. The new NVIDIA GPUs can give better performance than K80. At the same time, Google will come up with better TPU options. You can use this code to test new devices as they become available to validate the performance.

Summary

In this chapter, we looked at the Machine Learning model development lifecycle. We saw the steps involved in procuring and cleansing the data. We saw a workflow for selecting the best model‐building technique based on the type of data. We saw the hyper‐parameter tuning process and upcoming AutoML technology that helps find the best hyper‐parameters. Finally, we talked about model deployment to production. We also talked about deployment at the edge and using hardware accelerators, like GPUs and TPUs, to improve training and inference performance.

In the next chapter, we get specific about deploying Machine Learning models to production and talk about some of the best‐in‐class tools available. We will discuss examples of open source tools for different stages of the ML lifecycle and how we can combine them to form a Machine Learning pipeline using Kubernetes. We will talk about the H2O AI workbench with an example of building a regression model. We will explore TensorFlow‐serving to deploy models packaged as microservices in Docker containers. We will explore Kubeflow, which helps build ML pipelines for establishing a CI process for data science.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 9: Machine Learning Development Lifecycle

Create new playlist

Sign In

Sign Up