Background

We’re using Google assistant at home for controlling the lights, playing the news while having breakfast, getting weather forecasts and sometimes playing music for the kids.

The Mini’s are relatively cheap and if you disregard the fact that you’re letting Google even further into your private space they’re a great companion at home for the helping with some of the mundane daily stuff.

Now to the “problem”. We live in an area where we can commute using either bus or boat and asking the assistant would only give information about the next bus, never the bus after that. The response is also a bit too chatty and takes quite a while to complete.

This caused me to look into writing a custom integration. At first I was guided to IFTTT (IF This Then That), which is a great page for when you want the assistant to do some actions with pre-integrated things, like turning on a lamp and then replying with a static sentence.

Example

  • User says: “Ok google, movie time”
  • Assistant turns off the lights.
  • Assistant closes the curtains.
  • Assistant responds: “Done, enjoy the movie.”

But I needed to get a response that was dependent on the result of an external API call and I could not find any way to do this with IFTT. However, it looked like Google DialogFlow could.

Google DialogFlow

DialogFlow is Googles platform for creating applications interactions with humans using a conversation structure. You can do many advanced things with it such as follow-up questions, changing the conversation depending on context and machine learning. The goal in my case was quite simple:

  • Ask the assistant when the bus, or boat is
  • Have the assistant collect information about the bus and boat table using a REST API call
  • Respond to the user with content from the REST API call response

Getting an account

In order to use Dialogflow you need an account, so start with creating one, or link your existing account. After that we’re going to create our first so-called Agent.

Configure the name of the agent, the language and a time zone and click on Create.

Configuring an intent

Next up we’re going to start with configuring an Intent which is the phrase that will be used to help Google assistant understand that you want to trigger this particular action. Let’s start, shall we?

  1. Click on Intents
  2. Delete the existing intents as we don’t need them for this scenario
  3. Click on “Create Intent” at the top right corner
  4. Click on Events and add “Google Assistant Welcome” and “Welcome
  5. Click on Add training phases. These are examples of what you want the Google Assistant to answer on later on. In my case the Training phrases were “When is the next bus?“, “When is the bus?“, “When is the boat” and “When is the next boat?”. The more training phrases you add, the more likely it is for the Google assistant to apply machine learning to understand when users want to use your service. Neat huh?
  6. When you have added all the intents, click on “Enable Fulfillment“.
  7. Then click on “Enable webhook call for this intent
  8. Click on “Save“.

Let’s recap what we have done so far. First we created a new Agent, then we told the Agent to react on a few different Training Phrases by an Intent. Finally, we told the agent that we want the intent to be fulfilled by a Web Hook.

There’s other cool stuff we could do on this page such as parsing parameters from user input, but in our case we have all we need so let’s keep it simple.

Next it is time to configure the “Fulfillment” by telling DialogFlow which URL to use for the Web Hook. In my case I turned to Googles Cloud Platform for hosting it. So let’s take a break from DialogFlow and head on over to the web service creation.

Signing up for a GCP account

The intent here is to create a web service so you can use any platform for this, including your own private server or AWS. In my case I went with Google which has a great startup package where you get $300 when you create an account in their cloud platform. Using their Cloud functions is free for quite a large amount of calls which makes it an excellent decision for our needs.

Creating a cloud function

Cloud functions are pieces of code that is executed on demand. The server layer is abstracted (also called server-less) so you can focus 100% on your code. You can run either Nodejs, Python or Go in cloud functions. In this example I will use Python.

  1. Click on the “Hamburger” menu button and go to “Cloud Functions
  2. Click on “Create Function
  3. Give your function a name
  4. Assign an appropriate amount of memory. In my case 128MB was more than enough.
  5. Set the trigger to “HTTP
  6. Check “Allow unauthenticated invocations
  7. In the source code section, choose “Inline editor
  8. Write/Paste your script into the text area. Look below for an example.
  9. As Runtime choose Python 3.7
  10. In Function to execute, choose main or whatever function you want to use to initiate the cloud function
  11. Click on “Environment variables, networking, timeouts and more
  12. Choose the region and then click on “Create
  13. Note the cloud function URL that is generated.

Example script

The script below uses the SL.se (Stockholm Commuting) API to fetch information about the coming bus and boat trips near my station. I’ve redacted the API key and station id’s for privacy purposes.

import requests
import json
import datetime
import math
from flask import escape
import pytz

slApiKey = 'xxx'
busDepartureStationId = 'xxx'
busDestinationStationId = 'xxx'
boatDepartureStationId = 'xxx'
boatDestinationStationId = 'xxx'

def minutes_until(d):
    # Since I live in Stockholm we set the tz to Stockholm
    tz = pytz.timezone('Europe/Stockholm')
    now = datetime.datetime.now(tz)

    # Remove the time zone information since we can't do a datediff if it is still there
    # The datestamp will remain the same
    now = now.replace(tzinfo=None)

    # Return the difference in minutes and round downwards
    d = datetime.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
    return math.trunc(abs((now - d).total_seconds()/60))

def getTrips(departure, destination):
    response = requests.get(f'https://api.sl.se/api2/TravelplannerV3_1/trip.json?key={slApiKey}&originId={departure}&destId={destination}')
    data = json.loads(response.content)
    return data

def get_trip(departures, trip_number):
    departure = departures['Trip'][trip_number]['LegList']['Leg'][0]['Origin']
    departure_date = departure['date']
    departure_time = departure['time']
    return [departure_date, departure_time]

def convert_time_stamp(ts):
  return ':'.join(ts.split(':')[:-1])

def main(request):
    
    buses = getTrips(busDepartureStationId, busDestinationStationId)
    bus1 = get_trip(buses, 0)
    bus2 = get_trip(buses, 1)

    boats = getTrips(boatDepartureStationId, boatDestinationStationId)
    boat1 = get_trip(boats, 0)
    boat2 = get_trip(boats, 1)

    if minutes_until(' '.join(boat1)) > 180:
      boatResponse = 'There are no boats departing to the city within 3 hours'
    else:
      boatResponse = f'There\'s also a boat leaving at {convert_time_stamp(boat1[1])} and then another at {convert_time_stamp(boat2[1])}.'

    return json.dumps({
      "payload": {
        "google": {
          "expectUserResponse": False,
          "richResponse": {
            "items": [
              {
                "simpleResponse": {
                  "textToSpeech": f"The next bus leaves at {convert_time_stamp(bus1[1])}, and the one after that at {convert_time_stamp(bus2[1])}. {boatResponse}"
                }
              }
            ]
          }
        }
      }
    })

Configuring the Web hook in DialogFlow

  1. In DialogFlow, click on “Fulfillment
  2. Toggle Webhook to be “Enabled
  3. Paste the URL that was generated for you when you created the Cloud Function(or the URL to your service).
  4. Click on Save.

Testing the app

On the right side of your screen there is a text the says “See how it works in Google Assistant:

Click on to navigate to the Dialogflow Simulator here you can see how the interaction will work with your test app by entering text in the input section. You can also try to talk to your home assistant by saying “Ok Google, Talk to my test app“. This should trigger the asisstant to repeat the response given by your API call. You can also try to trigger usage of your app by using the training phrases from before, but I’ve found this to be a bit of a hit or miss depending on the uniqueness of the training phrase.

No luck? See the troubleshooting section below.

Deploying

When using the simulator from the previous section of the guide you have the option to deploy your app by creating a release. If you aren’t going to spread it to a larger crowd you might want create alpha or beta releases to a smaller crowd. Either way I’d start with a Alpha or Beta release.

Since Google has done a good job with explaining each field of the deployment forms I won’t get into the details of this part. The only thing I can say is that they do review both your code and your descriptions so it is worth to add some extra effort into being as verbose as you possibly can. Think about what you would like to see when looking through the actions library and populate the forms accordingly.

That’s the end of the guide. Please do leave a comment if you tried it and what you did with it. It’s always nice to be inspired!

Troubleshooting DialogFlow

Hopefully you don’t need to go through this part of the guide, but in case you run into trouble, here’s a few pointers.

Inspecting the machine to machine communications

In the DialogFlow simulator you have a top menu with buttons named “Request” and “Response”. If you click them you can inspect how the call to your REST API looks like and what the REST API response was. There’s also quite a lot of information in the “Debug” section, but I found the former to be more less noisy and more helpful.

Cloud function is timing out

The assistant has a fairly short timeout and if your cloud function is taking more than 5 seconds it will fail. You can see if this happens by looking at the response.

If this happens often you can consider a few of these things:

  • Rewriting your function to be faster, perhaps by caching data
  • Moving your cloud function to a region that is closer to your API source
  • Do some pre-warming of the cloud function as there might be a bit of additional delay when spawning the process if the Cloud function has not bee used in a long time. Be aware thought that pre-warming comes with a cost in terms of the number of executions so it might also be an alternative to move the code to a dedicated public server.

Cloud function has a malformed response

You can do a lot of nice stuff such as follow up questions and such, but if stuff is not working, go back to basics. Make your cloud function return a static response to rule out that the response is not the problem.

Here’s an example of a static response body that works:

{
  "payload": {
    "google": {
      "expectUserResponse": false,
      "richResponse": {
        "items": [
          {
            "simpleResponse": {
              "textToSpeech": "Wow, this app works with static responses!"
            }
          }
        ]
      }
    }
  }
}

The action is not triggered when using the training phrases

This section covers the case where the phrase “Talk to my test app” triggers your action but the training phrases doesn’t. I must confess that this part has puzzled me too. Some times it has worked directly, sometimes it has worked after a while.

However, doing these things has worked for me:

  • Adding more training phrases. Think about the different ways you can ask the assistant to do what you need it to do. “When is the next bus” could also be phrased a bit more sloppy as “When is the bus” or “When is the bus leaving”.
  • Try a totally different arbitrary command, like ie. “Where did all my smurfs go?” to see if your previous phrases simply just does not want to play ball with the search giant.
  • Test, test, test. Go through the simulator a few rounds with both text input and speech input.