In the last chapter, we made our bot conversational, allowing users to interact with our bot using natural language messages, just like interacting with another human. Another common method we of course use to communicate with each other is speech or voice. Pretty much every big tech company has their own software with a voice-based conversational interface: Apple has Siri, Microsoft has Cortana, and Google has Ok Google. Of course one of these voice assistants is Amazon Alexa, which powers Amazon Echo.
Amazon Echo is a smart, always online, hands-free speaker, which can be controllered by voice. Powered by the Alexa voice service, it comes with many built-in features or skills like the ability to play music, set alarms, manage todo lists, and more. Amazon provides the Alexa Skills Kit and allows developers to build their own Alexa skills. In this chapter you will learn how to build a custom Amazon Alexa skill.
Amazon provides a collection of APIs and tools called Alexa Skills Kit to build custom skills for Alexa. It allows you to build different types of skills, like a smart home skill to control smart home devices, flash briefing skills to provide content like news for user’s flash briefing, or custom skills that can connect to a cloud based service and can provide many features/functionality.
In the last two chapters we built a Facebook Messenger bot that allows users to search T-shirts. We offered a ‘Buy Now’ button to redirect our users to our online store where they can make a purchase. In this chapter we will build an Alexa custom skill that will allow the users to check their order status using the order number.
Before we start, let’s quickly go through how our bot engine will interact with the Alexa voice service. When a user invokes our skill using a voice-based command, Echo will send the audio to the Alexa voice service for processing. Alexa will process the voice input and extract a user’s intent and associated data, also known as slots, from the input. It will then send the intent and optional slots in JSON format to our service. Our bot engine will then process the request and generate a JSON response. Alexa reads the respond back to the user.
We will start by updating our store API to add another end point that will take the order number and return the order status. To keep things simple, let’s add some logic so that if the order status is in a range of 1000-1999, we will return a response that the order is in a processing state. For an order number in the range of 2000-2999, we will return a response that the order is shipped with an expected delivery date. Any other order number will return a response that the order is not found.
app
.
get
(
'/orders/:orderNumber/status'
,
function
(
req
,
res
)
{
const
orderNumber
=
req
.
params
.
orderNumber
;
if
(
orderNumber
>=
1000
&&
orderNumber
<=
1999
)
{
res
.
status
(
200
).
send
({
status
:
'PROCESSING'
});
}
else
if
(
orderNumber
>=
2000
&&
orderNumber
<=
2999
)
{
res
.
status
(
200
).
send
({
status
:
'SHIPPED'
,
estimatedDeliveryDate
:
new
Date
()
});
}
else
{
res
.
status
(
404
).
send
({
code
:
404
,
messasge
:
'NOT_FOUND'
});
}
});
Now that we have an end point, let’s setup a new app in the Amazon Development Portal.
Goto Amazon Development Portal and create an account if you do not have one.
Select the Alexa option from the top menu and then click on the Alexa Skills Kit ‘Get Started’ button. Click on the ‘Add a New Skill’ button.
We will start by setting up general details about the skill. Since we are building a custom skill, select the ‘Custom Interaction Model’ skill type. Select a language and set a name and an invocation name for our skill. The name is what will be displayed in the Alexa app and can be 2-50 characters long. The invocation name will be used by the users to activate our skill. It is recommended to have a short skill name up to three words long. The invocation name cannot have words like Alexa, Echo, or Amazon in it, and it should not be numerical or have any special characters and should be 2-50 characters long.
Please refer to the Invocation Name Guidelines for more details.
Since our skill will not use any Audio player directive, select ‘no’ under the Audio player option.
Next, we need to setup the voice interface for the users to interact with our skill. The interaction model maps user voice input to actions our service can provide. It is very similar to the stories we defined in Wit.ai.
Alexa Intent represents an action for the user intent to perform, and our service to handle. It is defined with a name and an optional list of arguments associated with the intent known as slots. The Intent Schema is structured in a way to define all of the intents our service can handle in JSON format.
Lets define the intent for our skill:
{
"intents"
:
[
{
"intent"
:
"GetOrderStatus"
,
"slots"
:
[
{
"name"
:
"OrderNumber"
,
"type"
:
"AMAZON.NUMBER"
}
]
}
]
}
Let’s say our user says, “Alexa, ask the Awesome t-shirt store what is the status of my order number 1234.” The word ‘Alexa’ will invoke the Alexa voice service, which is followed by our skill’s invocation name ‘Awesome t-shirt Store’ to invoke our skill, and finally the user’s intent to get the order status for order number ’1234.’ The Alexa voice service will map this voice input and send a request to our bot engine with the intent name GetOrderStatus
and the OrderNumber
slot value as ’1234.’
The Alexa Skills Kit also provides some built-in intents for common actions like to ask our skill for help, or to cancel or stop an action. Alexa will automatically map common ways a user asks for help from our skill, or asks to cancel an action to the built-in intent without a need to provide a mapping between user input and our intents, also known as a sample uttrance, which we will discuss in the next section.
You can read more about a built-in intent here.
Let’s update our intent schema to also include the built-in help, cancel and stop intent.
{
"intents"
:
[
{
"intent"
:
"GetOrderStatus"
,
"slots"
:
[
{
"name"
:
"OrderNumber"
,
"type"
:
"AMAZON.NUMBER"
}
]
},
{
"intent"
:
"AMAZON.CancelIntent"
},
{
"intent"
:
"AMAZON.HelpIntent"
},
{
"intent"
:
"AMAZON.StopIntent"
}
]
}
To process some user intent, our service may need some additional data. For example, to retrieve customer order status, we need an order number. These additional arguments are known as slots. Each intent can have a list of slots. And each slot object contains a name and a type. In our intent schema, we have a slot OrderNumber
in our GetOrderStatus
intent and the type is AMAZON.NUMBE
, which is one of the built-in slots.
Alexa supports many built-in slot types like date, number, and time. It also provides a built-in list type, which is a list of items with possible values.
We can also define our own custom slot type. For example, we can define a slot type TSHIRT_SIZE
with possible values like SMALL, MEDIUM, or LARGE. To define a custom slot type, we need to provide a type name and a list of possible values. We can then use the custom slot type in our intent schema, just like using a built-in slot type.
For Alexa to map a user input with one of the intents in our schema, we need to provide a sample utterance. Let’s look at the sample utterance for the GetOrderStatus
intent:
GetOrderStatus what is the status of my order number {OrderNumber} GetOrderStatus what is the status of my order {OrderNumber} GetOrderStatus my order {OrderNumber} status GetOrderStatus status of order {OrderNumber}
Here we have four phrases that map to the GetOrderStatus
intent with the OrderNumber
slot. A sample utterance is a sample phrase that a user might use to invoke a specific intent. Each sample utterance starts with an intent name the phrase should be mapped to, with slots defined within curly brackets. When a user speaks a matching phrase, the Alexa voice service will map it to the correct intent, and set values for the intent slots.
Remember, we do not have to provide a sample utterance for built-in intents. The Alexa voice service will automatically match commonly used words/phrase to built-in intents.
We need to configure the URL of our cloud-based service or bot engine. While Amazon recommends to use Amazon Lambda, their server-less computing platform, to run a backend service for an Alexa skill, we can use any cloud-based hosting service. Since we already have a bot engine deployed on Heroku, we can simply add another endpoint for Alexa service requests instead of writing a new application.
Next, we need to select a geographical region closer to our target users. The options are ‘North America’ and ‘Europe’. We can select both the options and can provide separate service URL based on the region. We will add a new endpoint/route in our bot engine '/alexa-webhook’ for all incoming requests from Alexa and our service URL will be our Heroku app URL with a '/alexa-webhook’ suffix. This is similar to our Facebook Messenger Platform webhook, but instead of '/webhook', the endpoint is '/alexa-webhook’.
Let’s look at the “Do you allow users to create an account or link to an existing account with you?” option, since we do not need a user to create or link an account. Select the ‘no’ option and click on the ‘Next’ button.
Since our app is deployed at Heroku and we are using a free sub-domain provided by Heroku, select the “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority” option under the SSL certificate and click Next.
Since we are not using the Aamzon Lambda to host our service, we need to do a little bit extra work. To make our skill accessible to public, we need to submit our skill for certification and to pass the certification our bot engine has to verify that all the incoming requests are valid and are coming from Amazon. This is to avoid someone else calling and using our service pretending to be Amazon. To verify incoming request, we need to check the validaity of the certificate, request a timestamp, and request a signature. You can read more about it in the ‘Alexa Skills Kit Security Testing’ section.
Thankfully, there are node modules that we can use with our service to verify Alexa requests. For Node.js app we can use the alexa-verifier module to verify all incoming requests, but since we are using Express, we can use alexa-verify express middleware - [alexa-verify-middleware] (alexa-verifier-middleware).
Install and add ‘alexa-verifier-middleware’ to package.json:
npm install --save alexa-verifier-middleware
Next, update index.js to use alexa-verify-middleware:
...
const
avm
=
require
(
'alexa-verifier-middleware'
);
app
.
use
(
avm
());
app
.
set
(
'port'
,
(
process
.
env
.
PORT
||
5000
));
app
.
use
(
bodyParser
.
urlencoded
({
extended
:
false
}));
app
.
use
(
bodyParser
.
json
());
...
Now, all of the incoming request will be verified. Remember to load ‘alexa-verifier-middleware’ before any other body parsing middleware.
Next, we need to add a new endpoint '/alexa-webhook,’ which we used in our skill service URL configuration.
Add the following code in index.js:
app
.
post
(
'/alexa-webhook'
,
function
(
req
,
res
)
{
console
.
log
(
'Incoming request'
,
req
.
body
.
request
);
res
.
send
({
version
:
'1.0'
,
response
:
{
outputSpeech
:
{
type
:
'PlainText'
,
text
:
'Hello World'
},
shouldEndSession
:
true
,
},
sessionAttributes
:
{}
});
});
Here we added a new endpoint to handle a POST request on the '/alexa-webhook’ route and respond with a simple hello world message.
Let’s quickly look at the response. Here the version field is Alexa API version set to ’1.0’. The sessionAttributes
field defined here as an empty object, can take a list of key-value pairs and can be used to maintain data between different requests for the same session.
In the response object, the outputSpeech
object is used by Alexa to voice output the response. In our sample response, we are setting the outputSpeech type to ‘PlainText.’ The other supported type is ‘SSML’ (Speech Synthesis Markup Language), which is a markup language for speech synthesis
SSML offers better control on how Alexa will convert the text to speech. When using PlainText type, the outputSpeech
object should have a ‘text’ field, similarly for ‘SSML’ type outputSpeech, which should have an ’ssml’ field with output text. The respone object also includes a boolean flag shouldEndSession
field to end or keep the session active.
Other than the fields we defined in our sample response, the response object can have have a cards object that defines the card to render to the Amazon Alexa App, the reprompt object that defines the outputSpeech
to re-prompt a user to respond, and a directive array with specific device-level actions to take using particular interface.
Please refer to Alexa Skill Kit Interface Reference for more details
Let’s update our bot engine to handle these requests.
First, let’s update our store-api.js and add another method to the retrieve order status:
const
retrieveOrderStatus
=
(
orderNumber
)
=>
axios
.
get
(
`
${
API_BASE_URL
}
orders/
${
orderNumber
}
/status`
);
module
.
exports
=
{
getSizes
:
sizes
,
getGender
:
genders
,
retriveProducts
,
retrieveOrderStatus
}
Next, let’s create a new file, alexa-bot-engine.js, with all of the logic to handle incoming requests from Alexa:
const
axios
=
require
(
'axios'
);
const
storeApi
=
require
(
'./store-api'
);
const
ALEXA_API_VERSION
=
'1.0'
;
const
buildResponse
=
(
outputSpeech
,
shouldEndSession
=
true
,
sessionAttributes
=
{})
=>
{
return
{
version
:
ALEXA_API_VERSION
,
response
:
{
outputSpeech
,
shouldEndSession
,
},
sessionAttributes
}
};
const
sendAlexaHelpMessage
=
(
res
)
=>
{
res
.
send
(
buildResponse
({
type
:
'PlainText'
,
text
:
'Hi, Awesome Tshirt Store Alexa skill can help you check your order status. What is your order number ?'
}));
}
const
sendAlexaThankYouMessage
=
(
res
)
=>
{
res
.
send
(
buildResponse
({
type
:
'PlainText'
,
text
:
'Thank you for using the Amesome Tshirt Store Alexa skill. See you next time'
}));
}
const
sendAlexaOrderStatus
=
(
res
,
slots
)
=>
{
const
orderNumber
=
slots
.
OrderNumber
.
value
;
if
(
orderNumber
)
{
storeApi
.
retrieveOrderStatus
(
orderNumber
)
.
then
(
response
=>
{
const
orderStatus
=
response
.
data
;
let
responseText
;
if
(
orderStatus
.
status
===
'PROCESSING'
)
{
responseText
=
'We are processing your order. Please check the status again later'
;
}
else
{
responseText
=
`Your order is shipped and will be delivered on
${
new
Date
(
orderStatus
.
estimatedDeliveryDate
).
toLocaleDateString
()
}
`
;
}
res
.
send
(
buildResponse
({
type
:
'PlainText'
,
text
:
responseText
}));
})
.
catch
(
function
(
error
)
{
console
.
error
(
'Unable to retrieve order status'
,
error
);
res
.
send
(
buildResponse
({
type
:
'PlainText'
,
text
:
`No order found with order number
${
orderNumber
}
`
}));
});
}
else
{
sendAlexaHelpMessage
(
res
);
}
}
const
intentHandler
=
{
'GetOrderStatus'
:
sendAlexaOrderStatus
,
'AMAZON.HelpIntent'
:
sendAlexaHelpMessage
,
'AMAZON.StopIntent'
:
sendAlexaThankYouMessage
,
'AMAZON.CancelIntent'
:
sendAlexaThankYouMessage
}
const
handleIncomingMessage
=
function
(
req
,
res
)
{
const
{
request
}
=
req
.
body
;
if
(
request
.
type
==
'IntentRequest'
)
{
const
{
intent
}
=
request
;
if
(
intentHandler
[
intent
.
name
])
{
intentHandler
[
intent
.
name
](
res
,
intent
.
slots
);
}
}
else
if
(
request
.
type
==
'LaunchRequest'
)
{
sendAlexaHelpMessage
(
res
);
}
}
module
.
exports
=
{
handleIncomingMessage
}
Let’s quickly go through the code.
We have a handleIncomingMessage
function to handle incoming requests from the Alexa voice service. We expose this method using modue.exports and make it accessible to other modules.
The Alexa request body contains a lot of details, like the Alexa API version sessionId. Information exists about the application and the user, session information is used to provide additional context about the request, and a context object provides information about the current state of the Alexa service and the device at the time of request. You can read about the Alexa skill request in detail here.
We are interested in the request object, which is part of the request body. This includes the request type, id, timestamp, and more. Alexa requests can use the following types:
In the handleIncomingMessage
function, we first check the request type. If the request is a LaunchRequest
type, we send the user our static help message. If the request type is IntentRequest
, we also recieve an intent object as part of the request, which contains the intent name and slots object. The slots object will have fields for each slot and value of the field set to a name-value pair for that slot.
Here is an example intent object. You will receive this if the user says something to invoke the GetOrderStatus
intent with the OrderNumber
slot value ’2222’.
"intent"
:
{
"name"
:
"GetOrderStatus"
,
"slots"
:
{
"OrderNumber"
:
{
"name"
:
"OrderNumber"
,
"value"
:
"2222"
}
}
}
For the request type IntentRequest
, we invoke different handler functions based on the intent name. The GetOrderStatus
intent will be handled by the sendAlexaOrderStatus
function. We check if the slot required to process the request is available or not. In this case, we check if the request contains the OrderStatus
slot with a valid value. If the value is missing, we send the help message to the user. If there is an order number, we call the storeApi to retrieve order status, and we build an outputSpeech response accordingly.
We defined a constant intentHandler
and use an object literal pattern to map the intent name with the method to handle it. So, for the intent GetOrderStatus
, we want to invoke the sendAlexaOrderStatus
function. When our service receives a Cancel or Stop intent request, we want to send a static thank you message to the user using the sendAlexaThankYouMessage
function. Similarly, when we recieve a Help intent request we send a static help message to the user using the sendAlexaHelpMessage
function.
We use the function buildResponse
to build the JSON response as expected by the Alexa service. This function takes the outputSpeech
object, and the shouldEndSession
boolean flag and sessionAttributes
object as parameters, and build the response object. If not provided, it will set shouldEndSession
to true and sessionAttributes
as an empty object.
Now, let’s update our index.js '/alexa-webhook’ to pass the incoming request to the alexa-bot-engine.
const
alexaBotEngine
=
require
(
'./src/alexa-bot-engine'
);
app
.
post
(
'/orderstatus'
,
function
(
req
,
res
)
{
alexaBotEngine
.
handleIncomingMessage
(
req
,
res
);
});
Now deploy these changes to Heroku and we are ready to test our skill.
Amazon offers two tools to test our skill. Under your Alexa skill, go to the ‘Test’ tab.
Let’s use the service simulator to test our bot engine, by entering the text, “What is the status of my order, one one zero zero ?” in the input text field and clicking on ‘Ask Awesome Tshirt Store.’ Alexa will generate a service request based on the entered utterance that will display the response received from our bot engine under service response.
You can also use echosim.io to test your skill. Echosim is an Alexa Skill testing tool. It is a browser based interface to Alexa. Simply login to echosim.io using your Amazon account. Then click and hold the microphone button and speak a command you like to test. Echosim.io will process and respond once you release the button.
To make our skill accessible to the public, we need to submit the skill for certification. Simply go to the ‘Publishing Information’ tab and fill in the form:
Once you fill in the form, click on the ‘Submit for Certification’ button. The Alexa team will review and test the skill and once approved it will be available for the public to use.
In Chapter 4 we implemented a text based conversational bot, and in this chapter we implemented a voice based one. We built a custom Alexa skill to retrive an order status tha we can check using voice commands completely hands-free using the Alexa voice service. We learned that for a text or voice-based interface, when using natural language, we need a way for our service to map user input to different functions our service can perform.
We hope that this book was helpful and will enable you to build amazing software with a conversational interface.