Amazon Alexa – Part 0, The Basics
In November 2014 Amazon annouced its Amazon Echo, and along with it Alexa. According to David Limp, the Amazon senior vice president who oversees Alexa and all of its Amazon devices, the goal of Alexa was inspired by the computer voice and conversational system on board the starship Enterprise in science fiction TV series and movies.
Alexa is still far off of being able to hold full conversations, right now it’s basically a question-answer system – but a quite good one at that.
It comes with some built-in capabilities, like looking up weather info, but the real key to success is Alexas ability to extend it’s knowledge base and functionally by third party “Apps”, called Skills.
Alexa isn’t just available on Amazons Echo devices, you can even use a Raspberry Pi to interact with Alexa.
This series of posts will give a minimal understanding of the structure of an Amazon Alexa Skill, the components they’re made up and finally how to quickly set up your own skill.
An Alexa Skill consists of two components you’ll have to provide, the Skill configuration and the Skill backend.
Skill Configuration, Interaction Model, Voice User Interface
This is the part the resides completely on Amazons cloud, you create this configuration in the Amazon developer portal and defines the Voice User Interface. It contains all the information about your skill, including the Interaction Model and backend / endpoint configuration.
The Interaction Model consists of
- Intents these represent actions that users can do with your skill, they are core functionality for your skill.
- Custom Slot Types define a list of possible values for an action
- Sample utterances that specify the words and phrases users can say to invoke those intents.
The voice user interface is the hardest part to figure out for beginners, as for most users it is quite uncommon to talk to a computer and you have to figure out a way to get all necessary information from the user without making him feel uncomfortable.
The Skill Backend
Written by the you, can run on Amazon Lambda or on your own infrastructure, although the configuration is a lot easier if you’re using Amazon Lambda.
Amazon does offer a free tier of Lambda if the usage amounts below a documented threshold, so it’s a great place to start your first tests. Lambda has a range of supported runtimes, this means you can use you your preferred language (Java, Node.js, C# or Python), although the Alexa Skills SDK samples only come in Java and Node.js. In my article I’ll use Java, as most examples in the currently available documentation refer to the Java samples.
The data flow can be described as follows:
- User asks questions using the voice user interface, i.e. says something to an Alexa enabled device to invoke an action, like “Alexa, what’s the weather like today?”.
- Alexa sends the questions to the Alexa service, here the heavy duty work is done. It translates the question uttered in natural language into something your backend understands, meaning it tries to find out what Intent to invoke and what the parameters (put into Slots) are.
- Alexa Service sends the Intent data in JSON via HTTPS to your backend.
- Using the information provided in the Intent, your backend decides what to do, if all necessary parameters are available (if not it has to respond with a question) and returns the sentences Alexa uses to answer the user.
In the part 1 we’ll discuss Intents and Utterances and how the help Alexa to understand what you intent to do.