Chapter 5. Voice assistant with Amazon Alexa

In the last chapter, we made our bot conversational, allowing users to interact with our bot using natural language messages, just like interacting with another human. Another common method we of course use to communicate with each other is speech or voice. Pretty much every big tech company has their own software with a voice-based conversational interface: Apple has Siri, Microsoft has Cortana, and Google has Ok Google. Of course one of these voice assistants is Amazon Alexa, which powers Amazon Echo.

Amazon Echo is a smart, always online, hands-free speaker, which can be controllered by voice. Powered by the Alexa voice service, it comes with many built-in features or skills like the ability to play music, set alarms, manage todo lists, and more. Amazon provides the Alexa Skills Kit and allows developers to build their own Alexa skills. In this chapter you will learn how to build a custom Amazon Alexa skill.

Alexa Skills Kit

Amazon provides a collection of APIs and tools called Alexa Skills Kit to build custom skills for Alexa. It allows you to build different types of skills, like a smart home skill to control smart home devices, flash briefing skills to provide content like news for user’s flash briefing, or custom skills that can connect to a cloud based service and can provide many features/functionality.

In the last two chapters we built a Facebook Messenger bot that allows users to search T-shirts. We offered a ‘Buy Now’ button to redirect our users to our online store where they can make a purchase. In this chapter we will build an Alexa custom skill that will allow the users to check their order status using the order number.

Before we start, let’s quickly go through how our bot engine will interact with the Alexa voice service. When a user invokes our skill using a voice-based command, Echo will send the audio to the Alexa voice service for processing. Alexa will process the voice input and extract a user’s intent and associated data, also known as slots, from the input. It will then send the intent and optional slots in JSON format to our service. Our bot engine will then process the request and generate a JSON response. Alexa reads the respond back to the user.

Update the Store API

We will start by updating our store API to add another end point that will take the order number and return the order status. To keep things simple, let’s add some logic so that if the order status is in a range of 1000-1999, we will return a response that the order is in a processing state. For an order number in the range of 2000-2999, we will return a response that the order is shipped with an expected delivery date. Any other order number will return a response that the order is not found.

app.get('/orders/:orderNumber/status', function (req, res) {
  const orderNumber = req.params.orderNumber;

  if (orderNumber >= 1000 && orderNumber <= 1999) {
    res.status(200).send({
      status: 'PROCESSING'
    });
  } else if (orderNumber >= 2000 && orderNumber <= 2999) {
    res.status(200).send({
      status: 'SHIPPED',
      estimatedDeliveryDate: new Date()
    });
  } else {
    res.status(404).send({ code: 404, messasge: 'NOT_FOUND' });
  }
});

Setup a new Alexa Skill

Now that we have an end point, let’s setup a new app in the Amazon Development Portal.

Goto Amazon Development Portal and create an account if you do not have one.

Select the Alexa option from the top menu and then click on the Alexa Skills Kit ‘Get Started’ button. Click on the ‘Add a New Skill’ button.

Skill Information

We will start by setting up general details about the skill. Since we are building a custom skill, select the ‘Custom Interaction Model’ skill type. Select a language and set a name and an invocation name for our skill. The name is what will be displayed in the Alexa app and can be 2-50 characters long. The invocation name will be used by the users to activate our skill. It is recommended to have a short skill name up to three words long. The invocation name cannot have words like Alexa, Echo, or Amazon in it, and it should not be numerical or have any special characters and should be 2-50 characters long.

Please refer to the Invocation Name Guidelines for more details.

Since our skill will not use any Audio player directive, select ‘no’ under the Audio player option.

Interaction model

Next, we need to setup the voice interface for the users to interact with our skill. The interaction model maps user voice input to actions our service can provide. It is very similar to the stories we defined in Wit.ai.

Intent Schema

Alexa Intent represents an action for the user intent to perform, and our service to handle. It is defined with a name and an optional list of arguments associated with the intent known as slots. The Intent Schema is structured in a way to define all of the intents our service can handle in JSON format.

Lets define the intent for our skill:

{
  "intents": [
    {
      "intent": "GetOrderStatus",
      "slots": [
        {
          "name": "OrderNumber",
          "type": "AMAZON.NUMBER"
        }
      ]
    }
  ]
}

Let’s say our user says, “Alexa, ask the Awesome t-shirt store what is the status of my order number 1234.” The word ‘Alexa’ will invoke the Alexa voice service, which is followed by our skill’s invocation name ‘Awesome t-shirt Store’ to invoke our skill, and finally the user’s intent to get the order status for order number ’1234.’ The Alexa voice service will map this voice input and send a request to our bot engine with the intent name GetOrderStatus and the OrderNumber slot value as ’1234.’

The Alexa Skills Kit also provides some built-in intents for common actions like to ask our skill for help, or to cancel or stop an action. Alexa will automatically map common ways a user asks for help from our skill, or asks to cancel an action to the built-in intent without a need to provide a mapping between user input and our intents, also known as a sample uttrance, which we will discuss in the next section.

You can read more about a built-in intent here.

Let’s update our intent schema to also include the built-in help, cancel and stop intent.

{
  "intents": [
    {
      "intent": "GetOrderStatus",
      "slots": [
        {
          "name": "OrderNumber",
          "type": "AMAZON.NUMBER"
        }
      ]
    },
    {
      "intent": "AMAZON.CancelIntent"
    },
    {
      "intent": "AMAZON.HelpIntent"
    },
    {
      "intent": "AMAZON.StopIntent"
    }
  ]
}
Slots

To process some user intent, our service may need some additional data. For example, to retrieve customer order status, we need an order number. These additional arguments are known as slots. Each intent can have a list of slots. And each slot object contains a name and a type. In our intent schema, we have a slot OrderNumber in our GetOrderStatus intent and the type is AMAZON.NUMBE, which is one of the built-in slots.

Alexa supports many built-in slot types like date, number, and time. It also provides a built-in list type, which is a list of items with possible values.

We can also define our own custom slot type. For example, we can define a slot type TSHIRT_SIZE with possible values like SMALL, MEDIUM, or LARGE. To define a custom slot type, we need to provide a type name and a list of possible values. We can then use the custom slot type in our intent schema, just like using a built-in slot type.

Sample Utterances

For Alexa to map a user input with one of the intents in our schema, we need to provide a sample utterance. Let’s look at the sample utterance for the GetOrderStatus intent:

GetOrderStatus what is the status of my order number {OrderNumber}
GetOrderStatus what is the status of my order {OrderNumber}
GetOrderStatus my order {OrderNumber} status
GetOrderStatus status of order {OrderNumber}

Here we have four phrases that map to the GetOrderStatus intent with the OrderNumber slot. A sample utterance is a sample phrase that a user might use to invoke a specific intent. Each sample utterance starts with an intent name the phrase should be mapped to, with slots defined within curly brackets. When a user speaks a matching phrase, the Alexa voice service will map it to the correct intent, and set values for the intent slots.

Remember, we do not have to provide a sample utterance for built-in intents. The Alexa voice service will automatically match commonly used words/phrase to built-in intents.

Service URL and SSL Certificate

We need to configure the URL of our cloud-based service or bot engine. While Amazon recommends to use Amazon Lambda, their server-less computing platform, to run a backend service for an Alexa skill, we can use any cloud-based hosting service. Since we already have a bot engine deployed on Heroku, we can simply add another endpoint for Alexa service requests instead of writing a new application.

Select the ‘HTTPS’ option.

Next, we need to select a geographical region closer to our target users. The options are ‘North America’ and ‘Europe’. We can select both the options and can provide separate service URL based on the region. We will add a new endpoint/route in our bot engine '/alexa-webhook’ for all incoming requests from Alexa and our service URL will be our Heroku app URL with a '/alexa-webhook’ suffix. This is similar to our Facebook Messenger Platform webhook, but instead of '/webhook', the endpoint is '/alexa-webhook’.

Let’s look at the “Do you allow users to create an account or link to an existing account with you?” option, since we do not need a user to create or link an account. Select the ‘no’ option and click on the ‘Next’ button.

Since our app is deployed at Heroku and we are using a free sub-domain provided by Heroku, select the “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority” option under the SSL certificate and click Next.

Update bot engine

Since we are not using the Aamzon Lambda to host our service, we need to do a little bit extra work. To make our skill accessible to public, we need to submit our skill for certification and to pass the certification our bot engine has to verify that all the incoming requests are valid and are coming from Amazon. This is to avoid someone else calling and using our service pretending to be Amazon. To verify incoming request, we need to check the validaity of the certificate, request a timestamp, and request a signature. You can read more about it in the ‘Alexa Skills Kit Security Testing’ section.

Thankfully, there are node modules that we can use with our service to verify Alexa requests. For Node.js app we can use the alexa-verifier module to verify all incoming requests, but since we are using Express, we can use alexa-verify express middleware - [alexa-verify-middleware] (alexa-verifier-middleware).

Install and add ‘alexa-verifier-middleware’ to package.json:

npm install --save alexa-verifier-middleware

Next, update index.js to use alexa-verify-middleware:

...
const avm = require('alexa-verifier-middleware');

app.use(avm());

app.set('port', (process.env.PORT || 5000));
app.use(bodyParser.urlencoded({ extended: false }));
app.use(bodyParser.json());

...

Now, all of the incoming request will be verified. Remember to load ‘alexa-verifier-middleware’ before any other body parsing middleware.

Next, we need to add a new endpoint '/alexa-webhook,’ which we used in our skill service URL configuration.

Add the following code in index.js:

app.post('/alexa-webhook', function(req, res) {
 console.log('Incoming request', req.body.request);
 res.send({
        version: '1.0',
        response: {
            outputSpeech: {type: 'PlainText', text: 'Hello World'},
            shouldEndSession: true,
        },
        sessionAttributes: {}
    });
});

Here we added a new endpoint to handle a POST request on the '/alexa-webhook’ route and respond with a simple hello world message.

Let’s quickly look at the response. Here the version field is Alexa API version set to ’1.0’. The sessionAttributes field defined here as an empty object, can take a list of key-value pairs and can be used to maintain data between different requests for the same session.

In the response object, the outputSpeech object is used by Alexa to voice output the response. In our sample response, we are setting the outputSpeech type to ‘PlainText.’ The other supported type is ‘SSML’ (Speech Synthesis Markup Language), which is a markup language for speech synthesis SSML offers better control on how Alexa will convert the text to speech. When using PlainText type, the outputSpeech object should have a ‘text’ field, similarly for ‘SSML’ type outputSpeech, which should have an ’ssml’ field with output text. The respone object also includes a boolean flag shouldEndSession field to end or keep the session active.

Other than the fields we defined in our sample response, the response object can have have a cards object that defines the card to render to the Amazon Alexa App, the reprompt object that defines the outputSpeech to re-prompt a user to respond, and a directive array with specific device-level actions to take using particular interface.

Please refer to Alexa Skill Kit Interface Reference for more details

Let’s update our bot engine to handle these requests.

First, let’s update our store-api.js and add another method to the retrieve order status:

const retrieveOrderStatus = (orderNumber) => axios.get(`${API_BASE_URL}orders/${orderNumber}/status`);

module.exports = {
  getSizes: sizes,
  getGender: genders,
  retriveProducts,
  retrieveOrderStatus
}

Next, let’s create a new file, alexa-bot-engine.js, with all of the logic to handle incoming requests from Alexa:

const axios = require('axios');
const storeApi = require('./store-api');

const ALEXA_API_VERSION = '1.0';

const buildResponse = (outputSpeech, shouldEndSession = true, sessionAttributes = {}) => {
    return {
        version: ALEXA_API_VERSION,
        response: {
            outputSpeech,
            shouldEndSession,
        },
        sessionAttributes
    }
};

const sendAlexaHelpMessage = (res) => {
    res.send(buildResponse({ type: 'PlainText', text: 'Hi, Awesome Tshirt Store Alexa skill can help you check your order status. What is your order number ?' }));
}

const sendAlexaThankYouMessage = (res) => {
    res.send(buildResponse({ type: 'PlainText', text: 'Thank you for using the Amesome Tshirt Store Alexa skill.  See you next time' }));
}

const sendAlexaOrderStatus = (res, slots) => {
    const orderNumber = slots.OrderNumber.value;
    if (orderNumber) {
        storeApi.retrieveOrderStatus(orderNumber)
            .then(response => {
                const orderStatus = response.data;
                let responseText;
                if (orderStatus.status === 'PROCESSING') {
                    responseText = 'We are processing your order. Please check the status again later';
                } else {
                    responseText = `Your order is shipped and will be delivered on ${new Date(orderStatus.estimatedDeliveryDate).toLocaleDateString()}`;
                }
                res.send(buildResponse({ type: 'PlainText', text: responseText }));
            })
            .catch(function (error) {
                console.error('Unable to retrieve order status', error);
                res.send(buildResponse({ type: 'PlainText', text: `No order found with order number ${orderNumber}` }));
            });
    } else {
        sendAlexaHelpMessage(res);
    }
}

const intentHandler = {
    'GetOrderStatus': sendAlexaOrderStatus,
    'AMAZON.HelpIntent': sendAlexaHelpMessage,
    'AMAZON.StopIntent': sendAlexaThankYouMessage,
    'AMAZON.CancelIntent': sendAlexaThankYouMessage
}

const handleIncomingMessage = function (req, res) {
    const { request } = req.body;

    if (request.type == 'IntentRequest') {
        const { intent } = request;

        if (intentHandler[intent.name]) {
            intentHandler[intent.name](res, intent.slots);
        }
    } else if (request.type == 'LaunchRequest') {
        sendAlexaHelpMessage(res);
    }
}

module.exports = {
    handleIncomingMessage
}

Let’s quickly go through the code.

We have a handleIncomingMessage function to handle incoming requests from the Alexa voice service. We expose this method using modue.exports and make it accessible to other modules.

The Alexa request body contains a lot of details, like the Alexa API version sessionId. Information exists about the application and the user, session information is used to provide additional context about the request, and a context object provides information about the current state of the Alexa service and the device at the time of request. You can read about the Alexa skill request in detail here.

We are interested in the request object, which is part of the request body. This includes the request type, id, timestamp, and more. Alexa requests can use the following types:

  • LaunchRequest - When a user invokes our skill with the invocation name and user voice input does not map to any intent.
  • IntentRequest - When a user says something that maps to one of our intents.
  • SessionEndedRequest - When a user ends the session or an error occured.

In the handleIncomingMessage function, we first check the request type. If the request is a LaunchRequest type, we send the user our static help message. If the request type is IntentRequest, we also recieve an intent object as part of the request, which contains the intent name and slots object. The slots object will have fields for each slot and value of the field set to a name-value pair for that slot.

Here is an example intent object. You will receive this if the user says something to invoke the GetOrderStatus intent with the OrderNumber slot value ’2222’.

"intent": {
    "name": "GetOrderStatus",
    "slots": {
        "OrderNumber": {
            "name": "OrderNumber",
            "value": "2222"
        }
    }
}

For the request type IntentRequest, we invoke different handler functions based on the intent name. The GetOrderStatus intent will be handled by the sendAlexaOrderStatus function. We check if the slot required to process the request is available or not. In this case, we check if the request contains the OrderStatus slot with a valid value. If the value is missing, we send the help message to the user. If there is an order number, we call the storeApi to retrieve order status, and we build an outputSpeech response accordingly.

We defined a constant intentHandler and use an object literal pattern to map the intent name with the method to handle it. So, for the intent GetOrderStatus, we want to invoke the sendAlexaOrderStatus function. When our service receives a Cancel or Stop intent request, we want to send a static thank you message to the user using the sendAlexaThankYouMessage function. Similarly, when we recieve a Help intent request we send a static help message to the user using the sendAlexaHelpMessage function.

We use the function buildResponse to build the JSON response as expected by the Alexa service. This function takes the outputSpeech object, and the shouldEndSession boolean flag and sessionAttributes object as parameters, and build the response object. If not provided, it will set shouldEndSession to true and sessionAttributes as an empty object.

Now, let’s update our index.js '/alexa-webhook’ to pass the incoming request to the alexa-bot-engine.

const alexaBotEngine = require('./src/alexa-bot-engine');

app.post('/orderstatus', function (req, res) {
    alexaBotEngine.handleIncomingMessage(req, res);
});

Now deploy these changes to Heroku and we are ready to test our skill.

Test skills

Amazon offers two tools to test our skill. Under your Alexa skill, go to the ‘Test’ tab.

  • Voice Simulator - Voice Simulator can be used test how Alexa will speak some text in plain text or SSML.
  • Service Simulator - Service Simulator can be used to test our service.

Let’s use the service simulator to test our bot engine, by entering the text, “What is the status of my order, one one zero zero ?” in the input text field and clicking on ‘Ask Awesome Tshirt Store.’ Alexa will generate a service request based on the entered utterance that will display the response received from our bot engine under service response.

You can also use echosim.io to test your skill. Echosim is an Alexa Skill testing tool. It is a browser based interface to Alexa. Simply login to echosim.io using your Amazon account. Then click and hold the microphone button and speak a command you like to test. Echosim.io will process and respond once you release the button.

Publishing skills

To make our skill accessible to the public, we need to submit the skill for certification. Simply go to the ‘Publishing Information’ tab and fill in the form:

  • Category - Select a category that best describes the skill. Shopping for this case.
  • Testing Instructions - Provide instructions to test the skill.
  • Countries & Region - Select the countries and region you like the skill to be available in.
  • Short Skill Description - Short description is displayed in the skill list in the Alexa App and can be up to 160 characters.
  • Full Skill Description - Full description include details about the skill like its features, purpose, and how it works.
  • Example Phrases - Provide some sample utterance to help users understand how to use the skill.
  • Keywords - Provide comma separated keywords that describe the skill and help users find the skill.
  • Images - Provide small (108108) and large (512512) icons in PNG that will be displayed in the Alexa app.

Once you fill in the form, click on the ‘Submit for Certification’ button. The Alexa team will review and test the skill and once approved it will be available for the public to use.

Summary

In Chapter 4 we implemented a text based conversational bot, and in this chapter we implemented a voice based one. We built a custom Alexa skill to retrive an order status tha we can check using voice commands completely hands-free using the Alexa voice service. We learned that for a text or voice-based interface, when using natural language, we need a way for our service to map user input to different functions our service can perform.

We hope that this book was helpful and will enable you to build amazing software with a conversational interface.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset