In the previous chapters, we designed a basic chatbot framework from scratch and explored integration options with third-party services and other backend systems. We also explained how to expose the IRIS chatbot framework as a Spring Boot REST API.
In this chapter, we will discuss different ways in which IRIS can be deployed on a remote server. We will also discuss how to integrate IRIS with Alexa in less than 5 minutes. At the end of the chapter, we will discuss how IRIS can be extended to be part of a continuous improvement framework by implementing a self-learning module and bringing a human into the loop.
Deployment to the Cloud
The IRIS framework exposed via RESTful APIs can be deployed to a remote server in multiple ways. In this section, we will discuss three different ways.
As a Stand-Alone Spring Boot JAR on AWS EC2
This is the most basic installation and deployment of a Spring Boot JAR. We follow a few steps for the JAR to run on the EC2 machine on port 8080
- 1.
We choose an AMI (Amazon Machine Image). We use Amazon Linux 2 AMI (HVM), SSD Volume Type 64 bit x86.
- 2.
We choose the instance type. We select t2.micro (also free tier eligible if you are using this service of AWS for the first time). The t2.micro instance has one vCPUs and 1 GB of memory, which is enough for the APIs to run.
- 3.
The next step requires configuring instance details. We can use a checklist to protect against accidental termination. This step is optional.
- 4.
We add storage details in the next step. By default, we get 8GB of SSD, and the volume is attached to the instance. However, we can add more volumes or increase the storage of the default volume if we want. This step is also optional, and 8GB of storage is enough for deployment for the demo.
- 5.
We add tags to instances and storage volume for better management of EC2 resources. This is also optional.
- 6.
This step, as shown in Figure 9-3, requires configuring a security group. A security group is a set of firewall rules that control the traffic for an instance.
- 7.
We review the configuration and launch the instance. Each EC2 instance requires a key-pair PEM file that we need to log into the instance securely. We will be asked to generate a new file, or we can use an existing one.
Now once the instance is launched, it will have a public DNS name or IPv4 Public IP that we can use to log in.
- 1.The login command to log in from any Unix machine:ssh -i chatbot-iris.pem [email protected]
- 2.Once we log in, we can then copy our Spring Boot JAR from local using the SCP command:scp -i chatbot-iris.pem /path/to/iris.jar [email protected]:/path/to/your/jarfile
- 3.Once the JAR is copied, we can run the JAR by issuing the commandjava -jar path/to/your/jarfile.jar fully.qualified.package.Application
- 4.
By default, the server starts on port 8080. However, if we want to change the port details, we can set the server.port as a system property using command line options such as -DServer.port=8090 or add application.properties in src/main/resources/ with server.port=8090.
As a Docker Container on AWS EC2
Docker performs operating-system-level virtualization. Docker is used to run software packages called containers. Docker makes it easier from an operations perspective because it packages the code, libraries, and runtime components together as Docker images that can be deployed with a lot of ease. For more details on Docker, visit www.docker.com/ .
- 1.We update the installed packages and package cache on the instance:sudo yum update -y
- 2.We install the most recent Docker Community Edition package:sudo amazon-linux-extras install docker
- 3.We start the Docker service:sudo service docker start
- 4.We add the ec2-user to the Docker group in order to execute Docker commands without usingsudo - sudo usermod -a -G docker ec2-user
- 5.
We log out and log back in again to pick up the new Docker group permissions. To do so, we close the current SSH terminal window and reconnect to an instance in a new one. The new SSH session will have the appropriate Docker group permissions.
- 6.
We verify that the ec2-user can run Docker commands without sudo.
- 7.We create a Dockerfile in the root directory of the code base. A Dockerfile is a manifest that describes the base image to use for the Docker image and whatever is installed and running on it. This dockerfile uses the openjdk:8-jdk-alpine image because we are building an image of a Java application. The VOLUME instruction creates a mount point with the specified name and marks it as holding externally mounted volumes from native host or other containers. The ARG instruction defines a variable that users can pass at build-time to the builder. The JAR is named as app.jar, and an ENTRYPOINT allows us to configure a container that will run as an executable. It contains the command to run the JAR:FROM openjdk:8-jdk-alpineVOLUME /tmpARG JAR_FILEADD ${JAR_FILE} app.jarENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/app.jar"]
- 8.We build a Docker image by issuing the following command:docker build -t iris --build-arg JAR_FILE=”JAR_NAME”.The following is the output from the build command executed on a machine:Sending build context to Docker daemon 21.9MBStep 1/5 : FROM openjdk:8-jdk-alpine8-jdk-alpine: Pulling from library/openjdkbdf0201b3a05: Pull complete9e12771959ad: Pull completec4efe34cda6e: Pull completeDigest: sha256:2a52fedf1d4ab53323e16a032cadca89aac47024a8228dea7f862dbccf169e1eStatus: Downloaded newer image for openjdk:8-jdk-alpine---> 3675b9f543c5Step 2/5 : VOLUME /tmp---> Running in dc2934059ab8Removing intermediate container dc2934059ab8---> 0c3b61b6f027Step 3/5 : ARG JAR_FILE---> Running in 36701bf0a68eRemoving intermediate container 36701bf0a68e---> da1c1f51c29dStep 4/5 : ADD ${JAR_FILE} app.jar---> 0aacdba5baf0Step 5/5 : ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/app.jar"]---> Running in f40f7a276e18Removing intermediate container f40f7a276e18---> 493abfce6e8cSuccessfully built 493abfce6e8cSuccessfully tagged iris:latest
- 9.We run the newly created Docker image via the following command:docker run -t -i -p 80:80 iris
As an ECS Service
Amazon ECS makes it easy to deploy, manage, and scale Docker containers running applications, services, and batch processes. Amazon ECS places containers across your cluster based on your resource needs and is integrated with familiar features like elastic load balancing, EC2 security groups, EBS volumes, and IAM roles. More details on ECS can be found at https://aws.amazon.com/ecs/ .
- 1.When discussing how to deploy a JAR as a Docker container on AWS EC2, we created a Docker image. We need to add this previously created Docker image to ECR. Amazon Elastic Container Registry (ECR) is a fully-managed container registry that makes it easy for developers to store, manage, and deploy container images.
- 2.Then we need to define the container definition. In the ECS service in the AWS management console, under Get Started, we can choose a container definition to use. We need to provide the ECR repository URL and Docker image name and tag, as shown in Figure 9-6.
- 3.We define the task definition. A task definition is a blueprint for an application and describes one or more containers through attributes. Some attributes are configured at the task level, but the majority of attributes are configured per container. In Figure 9-7, we create a task definition for IRIS.
- 4.Define a service. A service allows us to run and maintain a specified number (the desired count) of simultaneous instances of a task definition in an ECS cluster. See Figure 9-8.
- 5.We configure a cluster. The infrastructure in a Fargate cluster is fully managed by AWS. Our containers run without us managing and configuring individual Amazon EC2 instances. See Figure 9-9.
- 6.
Once we review and click Create, we should see the progress of the creation of ECS. Once the cluster is set up and task definitions are complete, the Spring Boot service should be up and running.
Smart IRIS Alexa Skill Creation in Less Than 5 Minutes
Invocation name
Intents, sample, and slots
Building an interaction model
Setting up a web service endpoint
In our example use case of IRIS, since we already have custom defined different possible intents, intent slots, and dialogs modeled as a state machine, we aim to redirect the user’s utterance on Alexa to the IRIS backend API so that it can process the utterance and respond.
Next, we define a custom intent and a custom slot type so that all of the user’s utterances are matched to this intent and slot type. The aim is to redirect the utterance to IRIS and not do any intent classification-related processing on the Alexa layer.
The details on hosting a custom skill as a web service are available at https://developer.amazon.com/docs/custom-skills/host-a-custom-skill-as-a-web-service.html .
Continuous Improvement Framework
In practical cases, it is very possible that a user’s utterances are not classified or understood by our intent engine due to several reasons such as the utterance being an external intent not part of the intent engine or the intent engine not confident due to the low intent match score. In a production environment, it is observed that there are a decent number of user utterances that are either misunderstood or not understood by the engine at all. We propose a framework that can help IRIS to become smarter and more intelligent towards mimicking a natural human conversation.
Intent confirmation (double-check)
Next intent prediction
A human in the loop
Intent Confirmation (Double-Check)
When we match the user utterance against the list of possible intents shown in Figure 9-21, we get a list of intents and respective match scores. The intent engine module of IRIS returns with an intent match only when the match score is above 0.75. We also call this as the minimum threshold score below which an intent match is not considered in response. In the example of “life insurance,” LIFE_INSURANCE_QUOTE_INTENT is returned in response from the intent engine.
An optimization to this implementation could be to introduce a minimum match score that is below the threshold score but relevant enough for further processing. We previously stated that the minimum threshold score is the score below which an intent match is not returned in response from the intent classification engine. A minimum match score is the score above which an intent is considered for further processing if it does not match the minimum threshold score.
In this example, the scores are below the minimum threshold score and in the current implementation the user utterance will default to search since no explicit intent was returned by the intent classification engine. If we consider a minimum match score of 0.5, intent LIFE_INSURANCE_QUOTE_INTENT could be considered for further confirmation.
These scores of 0.75 (minimum threshold score) and 0.5 (minimum match score) should be derived from training and test datasets, and could also change later on based on actual user utterance data and performance of intent classification engine.
Hence, we could make changes in IRIS to prompt for confirmation if the utterance was classified between 0.5 to 0.75.
Predict Next Intent
We can also use techniques of path prediction, association rules, and frequent itemsets to obtain the most predictable next user intent.
A Human in the Loop
The third improvement to the framework we introduced in this chapter was a human in the loop. Figure 9-20 shows various functional components for continuous improvement. Regardless of the techniques we use in making IRIS understand intent in a better way, there will always be some conversation that IRIS will not be able to understand. This is for the simple reason that IRIS does not have all the information of the universe and will always be designed to fulfill only a known set of functionalities.
We know that IRIS is designed to perform certain operations like calculating insurance eligibility, providing account balance, claim status, etc. Let’s assume that a certain percentage of users are asking IRIS for a change of address of their insurance policy. This is not supported by IRIS today, and it is challenging for machines to interpret this kind of new information.
Utterances such as questions around non-related things such as users asking about cricket match scores or details on Brexit or train timings will also happen. They are logs that do not need further processing and will be ignored by subject matter experts enhancing the IRIS feedback loop.
Summary
In this concluding chapter, we discussed the various ways to deploy a chatbot into the cloud, we demonstrated a 5-minute introduction to integrating IRIS with Alexa, and we discussed the continuous improvement of IRIS through feedback loops via log files and humans in the loop.
In this book, we have kept a fine balance with three pillars: business context, theoretical foundations on machines’ handling of natural languages, and real-world development of a chatbot from scratch. We believe these three pillars will help build a truly enterprise-grade chatbot, with a well-defined ROI. Additionally, we also focused on ethical concerns in using personal data and how countries in European Union have agreed upon the GDPR regulations to safeguard people’s privacy.