Custom Alexa Skill for Tracking Car Use

Custom Alexa Skill for Tracking Car Use

Over the last several weeks, I have been adding various home automation technologies to the house: Arlo for home security, Wemo and Lutron Caseta for automated lighting, and Amazon Echo/Alexa for voice control. Out of the box, Alexa’s integration with other smart home technologies is pretty good. It doesn’t take any custom work to be able to use your voice to turn lights on and off, and integrating Alexa with Arlo was fairly straightforward using the IFTTT service, which allows for basic “if this, then that” style applets that can be triggered via voice through Alexa.

However, in order to build a true smart home, I wanted to be able to write my own applications that could be executed within my IOT ecosystem that would serve needs very custom to me. A few of the initial ideas I had were:

  • Wine Cellar Integration: I want to be able to ask Alexa if we have a particular bottle in stock, and if so how many bottles we have. This would require integrating an Alexa skill with Vinocell, a wine cellar management application that I use.
  • Madison Restaurant Ideas: My wife and I frequently are indecisive about where to eat dinner. I want to be able to ask Alexa for ideas, tailored to our specific preferences and location, beyond what an app like Urban Spoon could provide.
  • Car Tracking: As a sports car collector, I have many cars. I often find myself wondering, when was the last time I actually drove the Porsche? How often in the last month or two have I driven the Porsche?

This post will focus on the last idea. It struck me as a fairly good first Alexa project, since it wouldn’t involve integrations with any third party APIs, just APIs that I’d have to develop to store the requisite data.

Requirements

I typically interact with Alexa every morning on my way out the door. I ask “what’s new?” to get my daily news briefing, ask what is on my calendar, ask about the weather, and ask about my commute. The goal for the skill is to be able to say, “Alexa, tell Hardin Home that I’m driving the Mercedes today.” Alexa will record a timestamp that I drove the Mercedes that day in a database, and retrieve that information when I say “Alexa, ask Hardin Home when I last drove the Mercedes” in the form of a sentence like, “You last drove the Mercedes six days ago.”

Architecture

This project would involve several components: An Alexa skill, which would call a function on AWS Lambda (written in node.js), which would call a series of very PHP APIs hosted on an Ubuntu/Apache EC2 instance with a MySQL database for storing the data about the cars. The EC2 instance would be placed inside of a VPC, with an AWS security group limiting access on port 80 solely to the VPC. This allows me to grant the Lambda function access to the VPC, so that it (and only it) can interact with my API, preventing me from having to implement a lot of additional security measures like I’d have to if the EC2 instance were open to the outside world.

screen-shot-2016-12-14-at-11-20-13-pm

API Implementation

To implement the API, I created a t2.small EC2 instance, and assigned it an elastic IP. I setup a security group that opened all ports within the VPC, and then granted access to my home IP on port 80 and 22 in order to allow me to connect to the server and deploy code, as well as to test web services from a browser:

screen-shot-2016-12-14-at-11-34-37-pmOnce this was done, I SSH’ed into my server and installed a basic LAMP stack:

sudo apt-get update
sudo apt-get install lamp-server^

After this, I installed phpMyAdmin, and created a database called hardin_home. I added two simple tables, cars, and cars_driven. The cars table holds information about each car (which will be used later in a sample conversational query with Alexa), and the cars_driven holds a list of timestamps for when each car was driven:

screen-shot-2016-12-14-at-11-38-41-pm

I implemented three quick-and-dirty PHP services that can be called by the Lambda implementation. An obvious refactor of this implementation would be to implement the API via a proper framework or microframework, but in this case I wanted to be able to crank out the API calls in five minutes, so they are just manual PHP. They are:

drive.php

$con = mysqli_connect("localhost", "XXXXX", "XXXXX", "hardin_home");
$result = $con->query("insert into cars_driven (car) values ('" . $con->real_escape_string($_REQUEST['car']) . "')");

mysqli_close($con);

last_driven.php

$con = mysqli_connect("localhost", "XXXXX", "XXXXX", "hardin_home");

$result = $con->query("select * from cars_driven where car = '" . $con->real_escape_string($_REQUEST['car']) . "' order by id desc limit 1");
$row = $result->fetch_assoc();

$timestamp = $row['driven_timestamp'];

if (date('Ymd') == date('Ymd', strtotime($timestamp)))
{
    echo "You last drove the " . $_REQUEST['car'] . " today.";
}
else
{
    echo "You last drove the " . $_REQUEST['car'] . " " . humanTiming(strtotime($timestamp)) . " ago, on " . date('F j, Y', strtotime($timestamp)) . ".";
}

mysqli_close($con);

more_info.php

$con = mysqli_connect("localhost", "XXXXX", "XXXXX", "hardin_home");

$result = $con->query("select * from cars where name = '" . $con->real_escape_string($_REQUEST['car']) . "'");
$row = $result->fetch_assoc();

echo $row['description'];

mysqli_close($con);

Lambda Implementation

According to Amazon, “AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume – there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.” Currently, Lambda supports node.js, Python, and Java. For this implementation, I selected node.js. First, I needed to configure a Lambda application to use node.js, and assign it a role to access the VPC that I setup earlier:

screen-shot-2016-12-14-at-11-48-50-pm

Under advanced settings, I gave it explicit access to my VPC (and thus my PHP services on my EC2 instance):

screen-shot-2016-12-14-at-11-49-00-pm

Prior to writing and deploying my node.js application package to Lambda, I needed to setup how the Lambda function would be triggered. Obviously for this implementation, the trigger would be an Alexa call:

screen-shot-2016-12-14-at-11-49-18-pm

Typically, applications are deployed to Lambda by uploading a ZIP file of the Lambda project. My project has a very simple file structure:

  • AlexaSkill.js: A base class provided by Amazon that I can inherit
  • index.js: My application
  • node_modules: Any third party node.js modules

For this project, I didn’t have any third party node.js modules, so node_modules is empty. In my index.js file, I started with the following:

'use strict';

// Link to our Alexa Skill (see next section):
var APP_ID = "amzn1.ask.skill.8b0b2dac-5031-4257-961d-3daccb68642f";

// The AlexaSkill prototype and helper functions:
var AlexaSkill = require('./AlexaSkill');

// Include the HTTP lib so we can call our PHP API:
var http = require('http');

// Our implementation:
var HardinHome = function () {
 AlexaSkill.call(this, APP_ID);
};

// Extend AlexaSkill:
HardinHome.prototype = Object.create(AlexaSkill.prototype);
HardinHome.prototype.constructor = HardinHome;

HardinHome.prototype.eventHandlers.onSessionStarted = function (sessionStartedRequest, session)
{
    // Any session init logic would go here...
};

HardinHome.prototype.eventHandlers.onLaunch = function (launchRequest, session, response)
{
    getWelcomeResponse(response);
};

HardinHome.prototype.eventHandlers.onSessionEnded = function (sessionEndedRequest, session)
{
    // Any session cleanup logic would go here...
};

Now that our base implementation is setup, we need to define our intent handlers. These are hooks that receive calls from the Alexa SDK when Alexa matches a particular speech pattern, which will be defined below in our Alexa SDK implementation:

HardinHome.prototype.intentHandlers =
{
    "CarsDriven": function (intent, session, response)
    {
        getCarsDriven(intent, session, response);
    },
 
    "CarsDrive": function (intent, session, response)
    {
        getCarsDrive(intent, session, response);
    },
 
    "CarsMoreDetail": function (intent, session, response)
    {
        getCarsMoreDetail(intent, session, response);
    },

    "CarsNoMoreDetail": function (intent, session, response)
    {
        response.tell("");
    },

    "AMAZON.HelpIntent": function (intent, session, response)
    {
        helpTheUser(intent, session, response);
    },

    "AMAZON.StopIntent": function (intent, session, response)
    {
        var speechOutput = "Goodbye";
        response.tell(speechOutput);
    },

    "AMAZON.CancelIntent": function (intent, session, response)
    {
        var speechOutput = "Goodbye";
        response.tell(speechOutput);
    }
};

From there, I needed to actually define the three key functions that are called in the block above: getCarsDriven, getCarsDrive, getCarsMoreDetail. The first asks Alexa when I last drove a car, the second tells Alexa I drove a car, and the third asks Alexa for more information about a car. That last call was something I implemented purely to experiment with Alexa’s conversational abilities, where she could ask me if I wanted more information about a car and could provide it if I responded yes.

getCarsDriven

function getCarsDriven(intent, session, response)
{
    var speechText = "",
    repromptText = "",
    speechOutput,
    repromptOutput;
 
    var car = intent.slots.Car.value;
    session.attributes['car'] = car;
 
    var request_car = "";
 
    if (car.toLowerCase() == "mercedes")
    {
        request_car = "Mercedes";
    }
    else if (car.toLowerCase() == "porsche")
    {
        request_car = "Porsche";
    }
    else if (car.toLowerCase() == "jaguar")
    {
        request_car = "Jaguar";
    }
    else
    {
        request_car = "Ford";
    }
 
    http.get("http://172.31.63.164/cars/last_driven.php?car=" + request_car, function (res)
    {
        var noaaResponseString = '';
        res.on('data', function (data)
        {
            noaaResponseString += data;
        });

        res.on('end', function ()
        {
            speechText = noaaResponseString;
            repromptText = "Would you like to learn more about that car? Please say yes or no.";
 
            speechOutput =
            {
                speech: speechText,
                type: AlexaSkill.speechOutputType.PLAIN_TEXT
            };

            repromptOutput =
            {
                speech: repromptText,
                type: AlexaSkill.speechOutputType.PLAIN_TEXT
            };

            response.askWithCard(speechOutput, repromptOutput, "Hardin Home: Cars", speechText);
        });
    });
 }

There are a couple of things to note in the above function:

  1. The function receives three arguments: intent, session, and response. The intent is an object that contains all of the input from Alexa, including custom variables mapped to custom slot types that I defined (see the next session). The session variable is an object that I can write to. This lets me preserve information across multiple Alexa calls, which is critical for maintaining state in a conversation. For example, I’d want to store the car being discussed so that if  I ask Alexa for more information about that car, I don’t have to repeat the name to Alexa in every sentence I speak. Finally, the response is an object that I call when I’m ready to return data. I can call response’s methods from within an asynchronous block, which is huge for this specific implementation since the intent function can return before I receive data back from an HTTP request, and I want to wait to call the response until I have data.
  2. The block of if statements that smooth the input is fairly important, since we don’t know what case we’re going to get back from Alexa. It also lets us account for things like homonyms if we’re not using a set custom slot type.
  3. Finally, I make an HTTP request to my EC2 server, and when I get data back I respond to Alexa. I call the askWithCard() method on the response object, which allows me to say a sentence (speechOutput), send a reprompt sentence (repromptOutput), and then send some text to display on a card view in the Alexa app, which will be visible from the iOS/Android app and will automatically appear on the Kindle Fire that I have paired with my Echoes.

getCarsDrive

function getCarsDrive(intent, session, response)
{
    var speechText = "",
    repromptText = "",
    speechOutput,
    repromptOutput;
 
    var car = intent.slots.Car.value;
    session.attributes['car'] = car;
 
    var request_car = "";
 
    if (car.toLowerCase() == "mercedes")
    {
        request_car = "Mercedes";
    }
    else if (car.toLowerCase() == "porsche")
    {
        request_car = "Porsche";
    }
    else if (car.toLowerCase() == "jaguar")
    {
        request_car = "Jaguar";
    }
    else
    {
        request_car = "Ford";
    }

    http.get("http://172.31.63.164/cars/drive.php?car=" + request_car, function (res)
    {
        var noaaResponseString = '';
        res.on('data', function (data)
        {
            noaaResponseString += data;
        });

        res.on('end', function ()
        {
            speechText = "Alright, I've recorded that you're driving the " + car + " today!";
 
            speechOutput =
            {
                speech: speechText,
                type: AlexaSkill.speechOutputType.PLAIN_TEXT
            };

            response.tellWithCard(speechOutput, "Hardin Home", speechText);
        });
    });
}

getCarsMoreDetail

function getCarsMoreDetail(intent, session, response)
{
    var speechText = "",
    repromptText = "",
    speechOutput,
    repromptOutput;
 
    var car = session.attributes['car'];
    if (car == undefined) car = "mercedes";
    var request_car = "";

    if (car.toLowerCase() == "mercedes")
    {
        request_car = "Mercedes";
    }
    else if (car.toLowerCase() == "porsche")
    {
        request_car = "Porsche";
    }
    else if (car.toLowerCase() == "jaguar")
    {
        request_car = "Jaguar";
    }
    else
    {
        request_car = "Ford";
    }
 
    http.get("http://172.31.63.164/cars/more_info.php?car=" + request_car, function (res)
    {
        var noaaResponseString = '';
        res.on('data', function (data)
        {
            noaaResponseString += data;
        });

        res.on('end', function ()
        {
            speechText = "Here is some more detail about the " + car + ": " + noaaResponseString;

            speechOutput =
            {
                speech: speechText,
                type: AlexaSkill.speechOutputType.PLAIN_TEXT
            };

            response.tellWithCard(speechOutput, "Hardin Home", speechText);
        });
    });
}

Lastly, I needed to define a hook to call all of the code I just wrote in response to Alexa input:

// Create the handler that responds to the Alexa Request:
exports.handler = function (event, context)
{
    var hardinHome = new HardinHome();
    hardinHome.execute(event, context);
};

Alexa SDK Implementation

After publishing the Lambda function, Amazon assigns it a ARN, which is a unique identifier that allows it to be called from other AWS services. A Lambda ARN looks something like this:

arn:aws:lambda:us-east-1:123456789:function:HardinHome

Note that Alexa can currently only call Lambda functions that are in the us-east-1 region (Northern Virginia) and eu-west-1 (Ireland), so my Lambda skill needs to be deployed there and have a corresponding ARN to be visible in Alexa. To create the Alexa app, I go to the Alexa SDK developer page and add a new skill. I set the skill information like so:

screen-shot-2016-12-15-at-11-55-44-am

After that, I point it at my Lambda function:

screen-shot-2016-12-15-at-12-00-27-pm

All that is left now is to define my interaction model, which specifies how I can talk to Alexa to activate the skill, and to test it. The skill will be automatically deployed to all of my Echoes, since my Alexa developer account is linked to my normal Amazon account that is associated with the Echo. My interaction model consists of several parts:

  • Intent Schema: This is a JSON structure that maps all of the callbacks that I defined in my Lambda function, and describes any variables that will be mined from the words that I speak to Alexa.
  • Custom Slot Types: These are custom enums that allow me to define options that Alexa can match. For example, I might define a custom slot type of “car”, with the options being the various cars that I own.
  • Sample Utterances: These are sample English phrases that are associated with intents in the intent schema, with wildcard variables that correspond to either custom or built-in slot types.

In the case of this skill, here is my intent schema (the intents should look familiar from the node.js code that I installed in Lambda):

{
 "intents": [
    {
        "intent": "CarsDriven",
        "slots": [
            {
                "name": "Car",
                "type": "LIST_OF_CARS"
            }
        ]
    },
    {
        "intent": "CarsDrive",
        "slots": [
            {
                "name": "Car",
                "type": "LIST_OF_CARS"
            }
        ]
    },
    {
        "intent" : "CarsMoreDetail" 
    },
    {
        "intent" : "CarsNoMoreDetail" 
    },
    {
        "intent": "AMAZON.HelpIntent"
    },
    {
        "intent": "AMAZON.StopIntent"
    },
    {
        "intent": "AMAZON.CancelIntent"
    }
 ]
}

The only custom slot type referenced above is LIST_OF_CARS, which is defined as:

mercedes | porsche | jaguar | ford | truck

Finally, here are my sample utterances, which reference both the custom slots and the intent schema:

CarsDriven when was {Car} last driven
CarsDriven what day was {Car} last driven
CarsDriven when did I last drive the {Car}
CarsDriven when I last drove the {Car}

CarsMoreDetail tell me more about that car
CarsMoreDetail yes
CarsMoreDetail yeah

CarsNoMoreDetail no
CarsNoMoreDetail nope

CarsDrive I drove the {Car} today
CarsDrive I'm driving the {Car} today

It should be fairly easy to follow, but the sample utterances allow me to talk to Alexa and say something like, “Alexa, ask Hardin Home when I last drove the Jaguar.” Alexa will respond, “You last drove the Jaguar on Monday. Would you like to learn more about this car? Please answer yes or no.” I can respond yes and be read a little blurb about the car, or no and Alexa will stop talking. I can also say, “Alexa, tell Hardin Home that I’m driving the truck today,” and Alexa will respond with, “Alright, I’ve recorded that you’re driving the truck today.” This discussion is exactly that I set out to do in my requirements above, so I’m done!

I enable the skill for testing and send it to my Echoes:

screen-shot-2016-12-15-at-12-16-34-pm

I can then use the handy debug console to send text snippets to my service, and examine the output:

screen-shot-2016-12-15-at-12-16-43-pm

I can also actually use the skill on my Echo, and everything works as expected!

Conclusion

This is obviously just an initial implementation for the potential capabilities of this skill. Aside from refactoring the API to utilize a micro-framework, there are a lot of cool things that could be done. I could add reporting capabilities to allow Alexa to respond to queries like, “How many times in the last three months have I driven the Porsche?” I could also add an integration for Arlo or SmartThings and IFTTT that utilizes motion sensors to automatically log when cars are taken out, instead of me having to tell Alexa. The possibilities are, as with most home automation tasks, essentially endless.

Jon Hardin

Website: http://hardinhome.wordpress.com

By day, Jon is the CEO of a software company. Outside of work, Jon is an avid home improvement enthusiast who enjoys a wide variety of renovation, landscaping, and other projects.

2 comments

Integrating Arlo, Wemo, and Echo via IFTTT – Hardin Home: Home Improvement Projects at a Country Home in Verona, WI

[…] writing custom Alexa skills has been necessary for some of the hyper-specific home automation tasks I’ve wanted to do, […]

Greg Haglund

Now all you need is a professional in the driver’s seat of that XJL when Alexa recommends Gibson’s for dinner, so the two of you can sit back and enjoy. (and yes I just happen to know someone).
And to think I was trying to tell YOU about my theory on UBER’s navigation algorithms… let me just get some of that egg off of my face…yikes!

Leave a Reply