CHAPTER 8
Deploying AI Models as Microservices

In the previous chapter, we talked about Cloud computing, containers, and microservices. We saw how Kubernetes extends beyond a Container‐as‐as‐Service (CaaS) platform into a full ecosystem for deploying software applications packaged as microservices. We also saw an example of deploying an application on Kubernetes by using abstractions like pods, deployments, and services.

In this chapter, we get into some more details of building applications using Kubernetes. We build a simple web application using Python, package it as a Docker container, and deploy to a Kubernetes cluster. Then we modify this application to actually invoke a Deep Learning model and show the results on a web page. Here we start connecting the Keras and Kubernetes worlds together. We see how to build production‐quality Deep Learning applications, thus combining the best of these two technologies.

Building a Simple Microservice with Docker and Kubernetes

Let's get started by building a simple microservice application and then packaging it into a container. The idea of microservices is that the application is self‐contained so it can be deployed and scaled independently as a container instance. First, our application will only show a simple message by reading a text string. We will later do some processing on that text string.

We will use Python to build this web application. Python was traditionally used more for scripting and data science applications. However, in recent years, it has gained huge popularity in developing all sorts of software, including web applications. Many web application frameworks are available that work on Python and help you quickly build the applications. Some of these are Django and Flask, which we will use.

Instead of Python, you can build web applications in languages like Java and NodeJS (JavaScript). Whatever the language, you will need some framework that will form the backbone of your application. The most popular frameworks for NodeJS and Java (as of 2018) are ExpressJS and Spring, respectively. These web app frameworks take care of a lot of the underlying details of building your application and communication over HTTP. Ultimately, you end up writing very basic and focused code specific to your application and don't have to worry about the plumbing for the whole app.

Let's look at an example in Python. You will need Python 2.7 (or above) or Python 3.3 (or above) installed. Most modern machines will have Python installed. You can download from python.org if not already installed. We will install the Flask web framework using the Python package installer—pip. You will also need docker engine that can be installed from docker.com. The commands in Listing 8.1 will enable you to check your environment for necessary installations, including Python, Flask, and Docker. We will also create a new folder with the name of your app (such as simple‐app) and run commands to build a basic skeleton for the app. We will add details in next sections.

The touch command at end of Listing 8.1 creates empty files that serve as a skeleton for our web app. We create three files in this example. Let's look at what each file will contain:

  • app.py: Main application logic in Python. Create the HTTP endpoints for the app.
  • requirements.txt: Contains Python libraries that are dependencies for this app.
  • Dockerfile: Contain instructions to package the app into a Docker container.

Now let's populate these three files with the logic of our app. We will start with the app.py file containing the application that we will develop. Our application will have some boilerplate code that will be needed to use the Flask framework. I will highlight it so you can just copy it directly. We will create an HTTP endpoint that will respond to incoming requests from clients. Clients will use a web browser to make HTTP GET or POST calls to our endpoint and it will respond as per the code we add. This will be the logic of our web application. Listing 8.2 shows the Python file we open in a text editor. The lines starting with # are comments.

Listing 8.3 shows contents of the requirements.txt file. We have to add the libraries we need as dependencies. Here we need Flask for running our web application. We also include TensorFlow and Keras. These libraries will be used in the future when we add the DL code. We can update versions to the latest ones we use.

Our microservice application will have a single HTTP endpoint, which will respond with a Hello World! message. We can test our application by running the Python interpreter on it and seeing the result in a web browser, as shown in Listing 8.4.

You may be asked to allow permission to open ports on your machine. This is basically opening an HTTP protocol port and listening to messages coming in on this port. When new messages come from clients on this port, the function code is invoked and we will get a nice return message.

Figure 8.1 shows what you will see in a browser by opening http://localhost:1234/hello.

Screenshot displaying a web page in a browser on opening http://localhost: 1234/hello.

Figure 8.1: What you see in the browser

Now we will add a new HTTP endpoint called process to read a text parameter. Here is the application logic we will have. When no parameter is passed, we will show a simple HTML page with a big textbox (a TEXTAREA in HTML terms). We will have an HTML SUBMIT button so that we can submit the text back to the same process endpoint. Now when the form is submitted with a value for the TEXTAREA parameter (text_input), we will just display this on‐screen. That's it.

Keep in mind, in real‐world HTML you will use stylesheets to beautify this page and keep this HTML code in separate files called templates. Also, you will normally have multiple pages, one for showing input forms and one for submission results.

However, to keep the logic crisp and simple we have a single block of code. Let's look at the new code we add to our app.py file. In Listing 8.5, I show the full code for app.py, but the older code is grayed out so you can focus on the new code only. Listing 8.6 shows running the new app.py file.

In your web browser, go to http://localhost:1234/process and you should see something similar to Figure 8.2.

Screenshot displaying a simple Python NLP demo page of the new app.py file in a browser.

Figure 8.2: The new app.py file shown in a browser

Enter the text and press Submit. You will get the page shown in Figure 8.3.

Screenshot displaying a localhost page depicting the result after pressing Submit.

Figure 8.3: The result after pressing Submit

You can see that the text you entered was submitted to the endpoint as a parameter named text_input (the name you gave the HTML TEXTAREA field). Of course, the text is actually modified to replace spaces and commas so that it can be transmitted over HTTP properly. However, it is decoded and shown as HTML inside the bold <B> tag on the results page.

Adding AI Smarts to Your App

So, there we have it—we developed a new application to process text inputs. We are not yet processing any text inputs. Let's process our text using the Natural Language Processing (NLP) sentiment analysis model we created earlier in Python and Keras. If you remember from Chapter 5 (“Advanced Deep Learning”), we used Keras to build a recurrent neural network using LSTM layers. This model was trained on samples of positive and negative sentiment texts. We will now use that model in this web application.

The NLP model was saved as an H5 binary file. We will load this in Keras at the beginning when our web application loads. This instance of the model is saved in memory as long as the application is running. If we scale this application on three real or virtual machines, each machine will have an instance of the model and make predictions in its own process and memory space. That's how scaling will help Deep Learning models. We do not bog down a single node's resources but distribute our workloads to multiple machines.

Listing 8.7 shows the code to load the model and we will create a function in Python that will process the text you provide and return a 0 (a positive sentiment) or 1 (a negative sentiment). So after we apply this Deep Learning model to our text in the web application, we will be able to have an Artificial Intelligence system that can read text input and tell us if your intention is good or bad.

First, we need to place the imdb_nlp.h5 binary model file in the same folder as our app.py file. The Python code shown in Listing 8.7 will load this file and create a function that we can call to get the sentiment of the input text. Again, I will highlight the new code; older code is in gray.

Take a moment to go through the code in Listing 8.7. It builds on the code we have been developing for our test web app. We take the input from the HTML form as an in_text variable as we saw earlier. But instead of simply writing that back, we feed that to a newly created function called predict_sentiment. This function calls our NLP model already loaded from a binary file. The function converts our text sequence to a sequence of integers using the same vocabulary we used for the training data.

As a reminder, the vocabulary is basically a list of all words in your domain with an integer. Typically, this integer value corresponds to how often this word appears in your list of documents. So the most common words will have a lower integer value, while less frequent words will have higher ones. The vocabulary we use is built from the IMDB dataset that Keras provides for testing our NLP models.

We have a new route called process, which is mapped to the HTTP endpoint with the same name. Here we take the input text passed on to our HTML form and pass that to the function. Depending on the output of our NLP model, we determine if it's a negative sentiment (output > 0.5) or a positive one (output < 0.5). Keep in mind that the model is only as good as the training data we provide. Our training data is from the IMDB movie review text database and we choose the first 10 words of the review to classify the sentiment. The accuracy will increase if you use more words or use a bigger text database. For now, Listing 8.8 shows the results.

In your web browser, go to http://localhost:1234/process. You should see the image shown in Figure 8.4.

Screenshot of the http://localhost:1234/process page depicting an image in the new app demo in the browser.

Figure 8.4: The new app demo in the browser

Type in a phrase and click Submit. Here we typed the phrase “its a wonderful life.” Figure 8.5 shows the result.

“Screenshot displaying the phrase “its a wonderful life” after pressing Submit on a localhost page.”

Figure 8.5: What you get after pressing Submit

That's it—this simple app reads your text and tells you what sentiment the phrase shows. Let's try another example, shown first in Figures 8.6 and 8.7.

Screenshot of a web page for typing some text to analyze a sentiment using Deep Learning.

Figure 8.6: Entering a new phrase

Screenshot displaying a negative result for the text “my whole body hurts,” in a browser page.

Figure 8.7: This one results in a negative result

You can try different phrases. The program is not guaranteed to get it right, but as you build better models, you will see the accuracy increase a great deal.

There you have it. You have developed a Natural Language Processing (NLP) model using Keras. This was a recurrent neural network model using LSTM layers. We trained this model on the publicly available IMDB movie reviews dataset for sentiment analysis. We got an accuracy of 95% on the training data and 70% on the validation set. We saved this model as an H5 file in HDF5 format.

We created a Python web application using the Flask framework. The application showed an HTML form where we could input text data and this would be sent to our application. When we get this data through our HTTP endpoint, we run the NLP model on this text and predict the sentiment. Based on the prediction, we tell the user if it's a positive or negative sentiment.

This is just a basic application. You can use the wonders of CSS and JavaScript to make this fancy, with special types of widgets for inputting and displaying data. Maybe instead of a bland text message, you want to show smileys with happy and sad emotions after submission. Maybe you want to process the text in real time as keys are entered into the textbox. As long as you have a solid Deep Learning model and a good connection established to invoke it from the HTML data, you can explore all these outcomes. The code in this chapter has hopefully given you the framework for building such awesome applications!

Packaging the App as a Container

Now let's build a Docker container with our app. If you remember, the Docker container will be holistic with the AI model, our source code, the application server, and the operating system. Of course, these will not be encapsulated into the container but referenced as individual layers.

First, we will fill the requirements.txt file with any dependencies we want to install when we build our container. In this case, we have Flask as the dependency we need to build the web application. We also need TensorFlow and Keras to run our Deep Learning model. Let's include these. You can provide a version for these libraries, or the latest version will be deployed. It is usually recommended to use the same version of libraries that you have tested on to avoid any surprises. To get the current version of the library that you have installed, you can run the following command: pip freeze.

Listing 8.9 shows the requirements.txt file.

Now we will populate the Dockerfile with the following instructions. You can find these instructions for other platforms like NodeJS and Java on the Internet. These are instructions we will use for Python. As we saw earlier, the Dockerfile has to be in the same folder as our app.py file.

The Dockerfile will have a set of commands to create your application environment from scratch. You can run this on any machine and the exact same Docker container will be created, and your app will run inside this environment. This is the power of Docker. Since you are building the whole environment from scratch you can be sure all dependencies will be taken care of. Again, in reality the whole installation does not occur, but layers are incrementally added to build the environment. Inside the Dockerfile, you will see Linux‐like commands and lines starting with a hash (#), which are the comments.

Let's go through the steps for building this container—see Listing 8.10.

Hopefully my comments are self‐explanatory and you can follow each step. We start with the OS we want for our container—here we choose the latest version of Ubuntu. We run updates and install Python and some build tools. Then we copy all the files from the existing folder to the container and run pip to install all Python dependencies. Finally, we start our application by running the Python command with our file as the parameter.

Now, with this build script in Dockerfile, we will create our container image. This image is the template for our container and once we have it, we can spin off as many containers as needed.

Here is the command to build the image. I am calling my image dattarajrao/simple‐nlp‐app. You can give yours any name, but I prefer using the convention <<docker account>> / <<image name>>. That way, you can upload your images to the Docker images repository very easily. See Listing 8.11.

First, let's see the look at the current folder. We have the application's Python file, Dockerfile, requirements file, and our NLP model binary file. If you have a more elaborate application, there will be more files like HTML, CSS, and JS files. But here we have a very simple app. Let's build the container:

$ docker build -t dattarajrao/simple-nlp-app 

Here is the consolidated output of this command. We have eight steps to run defined in our Dockerfile and it will run each and show us the status. If any of the steps fails, you may want to Google the right command since these commands may change with a different version. It will take a few minutes to run depending on your Internet connection. It downloads the dependent layers needed to build the image:

Sending build context to Docker daemon  32.34MB
Step 1/8 : FROM ubuntu:latest
 
      << will take some time to download image >>
 
 ---> 113a43faa138
 
Step 2/8 : RUN apt-get update -y
 
      << will take some time to run command >>
 
 ---> a497349f5615
 
Step 3/8 : RUN apt-get install -y python-pip python-dev build-essential
 
      << will take some time to run command >>
 
 ---> dd4b73ae6437
 
Step 4/8 : COPY . .
 ---> 6cedbaa3a50a
 
Step 5/8 : WORKDIR .
 ---> Running in 1f83ed6e49b3
Removing intermediate container 1f83ed6e49b3
 ---> 87faae5504c6
 
Step 6/8 : RUN pip install -r requirements.txt
 ---> Running in e4aa8eeff06d
Collecting Flask==1.0.2 (from -r requirements.txt (line 1))
  Downloading 
 
      << will take time to download,install dependencies >>
 
Removing intermediate container e4aa8eeff06d
---> 1729975b6f07
 
Step 7/8 : ENTRYPOINT [ "python" ]
---> Running in 24dec1c6e94b
Removing intermediate container 24dec1c6e94b
---> c1d02422f07
 
Step 8/8 : CMD [ "app.py" ]
---> Running in 53db54348f94
Removing intermediate container 53db54348f94
---> 9f879249c172
 
Successfully built 9f879249c172
Successfully tagged dattarajrao/simple-nlp-app:latest 

Now you created a Docker image that you can see in the images list. The image is tagged by the name dattarajrao/simple‐nlp‐app:latest. This is the name we will use to refer to the image and build containers from it. We will also use this name to push this image to a central container repository, like DockerHub. Let's first see the list of images on our machine:

$ docker images
 
REPOSITORY                   TAG      IMAGE ID          CREATED        SIZE
dattarajrao/simple-nlp-app   latest   9f879249c172      25 minutes ago 1.11GB
ubuntu                       latest   113a43faa138      5 months ago   81.2MB 

We see two images created and downloaded. One is the application image we created. It also downloaded the latest Ubuntu image and made it available on our machine. This image was used to build our application image on top.

Now we will create a container by running this image. The container will be an instance of this image and will act like a virtual machine. Only it will be created much faster (in milliseconds) and will be much smaller in size. Once created, the container will have its own IP address and will, for all practical purposes, act like a separate machine. See Listing 8.12.

This command will create a container with our Docker image as a template. Since the container is a separate machine with an IP address, we need a way to access our application. So we map the port 1234 from our machine to the container port using the ‐p option. The container will start and will run the Python application that will run the Flask application. Since we are loading the NLP model initially in our application, Keras will download the IMDB dataset to get the vocabulary for feeding data to the model. Here is the typical output we will see:

Using TensorFlow backend.
 
Downloading data from https://s3.amazonaws.com/text‐datasets/imdb_
word_index.json
1654784/1641221 [==============================] - 9s 5us/step
 
* Serving Flask app "app" (lazy loading)
* Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://0.0.0.0:1234/ (Press CTRL+C to quit) 

Don't worry about the development server warnings. Flask by itself provides an experimental web server, which is good for demos but should not be used in production. You should typically plug your application into a full web server like NGINX. You can look up how to do this in the Flask documentation.

Now since we have mapped the 1234 port from a local machine to the container, we should be able to see our application on the local host.

In your web browser, go to http://localhost:1234/process. You should see the screen in Figure 8.8.

Screenshot displaying a demo screen on the local host of a browser page.

Figure 8.8: Demo on the local host

Type in a phrase and click Submit. Here, we typed the phrase “its a wonderful life.” Figure 8.9 shows the result.

“Screenshot displaying the result on the local host after typing the phrase “its a wonderful life.””

Figure 8.9: Result shown on the local host

Pushing a Docker Image to a Repository

Now we will push this container image to a common Docker image repository called DockerHub. Organizations may maintain their private repositories for images as needed. For our example, we will use DockerHub.

Before pushing an image, you will need an account. Log in or create an account at https://hub.docker.com and then use the following command to push your image. While pushing an image, the tag name of the image should match your DockerHub account. In my case, my DockerHub account name is dattarajrao, so I can push my image with the command shown in Listing 8.13.

Log in with your Docker ID to push and pull images from DockerHub. If you don't have a Docker ID, head over to https://hub.docker.com to create one:

Username: dattarajrao
Password: ***********
Login Succeeded
 
$ docker push dattarajrao/simple-nlp-app
b0a427d5d2a8: Pushed 
dcf3294d230a: Pushed 
435464f9dced: Pushed 
fff2973abf54: Pushed 
b6f13d447e00: Mounted from library/ubuntu 
a20a262b87bd: Mounted from library/ubuntu 
904d60939c36: Mounted from library/ubuntu 
3a89e0d8654e: Mounted from library/ubuntu 
db9476e6d963: Mounted from library/ubuntu 
latest: digest: sha256:5a1216dfd9489afcb1dcdc1d7780de44a28df59934da7fc3a
02cabddcaadd62c size: 2207 

The image is now pushed onto the Docker repository and others can access it. You will notice that the push also happens layer by layer. This way, only the modified changes are overwritten instead of writing the whole image every time. We can now use this in our Kubernetes deployments.

Deploying the App on Kubernetes as a Microservice

Now that we have our application packaged along with our AI model and all dependencies as a Docker container, we can deploy it in the Kubernetes ecosystem. Just like with a regular web app we saw in the previous chapter, now we will create a deployment for this application containing an AI model.

Let's start by creating a YAML file for the deployment, as shown in Listing 8.14.

This YAML file looks very similar to the simple‐app.yaml file in the previous chapter. Since all our AI logic is captured in the Docker container, our Kubernetes deployment remains very standard. The only major changes are in the name of the Docker image and the container port. We will now create a deployment using this YAML file. See Listing 8.15.

Create a deployment with this YAML file. It creates a pod with a container specified by the image called dattarajrao/simple‐app:

$ kubectl get deployments
NAME                     DESIRED  CURRENT  UP‐TO‐DATE  AVAILABLE  AGE
simple-nlp-app-deployment    3       3         3           3     58s 

Depending on the size of your Keras model, the container size will increase. Thus, creating the container may take some time since it has to download the image from the repository. After some time, you will see all the pods running:

$ kubectl get pods
NAME                                     READY   STATUS    RESTARTS   AGE
simple-nlp-app-deployment-98d66d5b5-5l8x6   1/1     Running   0        1m
simple-nlp-app-deployment-98d66d5b5-95c9m   1/1     Running   0        1m
simple-nlp-app-deployment-98d66d5b5-bvnq5   1/1     Running   0        1m 

We could use a YAML file to define a service to expose our deployment as earlier. Another way to create a service quickly to expose the deployment is by using the expose deployment command:

$ kubectl expose deployment simple‐nlp‐app‐deployment 
‐‐type=NodePort 

Now we will see a service with the same name as the deployment we created. If we are using Minikube, we can get the IP address of the service quickly using following command:

$ minikube service simple‐nlp‐app‐deployment ‐‐url 
http://192.168.99.100:32567
  

The result will be different based on your setting. If you are connecting to a Kubernetes cluster, you should be able to get an external IP address for your service. Once you have that, you can access your application using the link in the browser. See Figure 8.10.

Screenshot of a demo page for accessing the application as a Docker app.

Figure 8.10: Accessing the application as a Docker app

There you have it; your NLP analytics application is now packaged as a Docker container and running inside the Kubernetes ecosystem. Now you can take advantage of all the infrastructure features that Kubernetes provides, like scaling, fail‐over, load balancing, etc.

Summary

In this chapter, we developed a web application using Python and the Flask framework. We packaged it as a Docker container and deployed this to a common container registry. We updated this application to add code to invoke a Deep Learning NLP model and display results on a web page. We moved beyond command lines and data science notebooks and learned how we can push models in the wild and have them running alongside web applications. Now we can leverage the power of the Kubernetes platform to scale and load balance these AI applications and make them secure and robust.

Getting a model deployed within a web application is just scratching the surface. We need to be able to incorporate data science steps involved in building AI models into the software development lifecycle. Using the state‐of‐the‐art agile practices like continuous integration and delivery, we need to be able to not just integrate and deliver code, but also deliver Deep Learning models. This is what we will talk about in the next chapter. We will talk about the typical Machine Learning model lifecycle and development process. We will also explore some best practices and tools to make the deployment easier and automated.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset