2017-12

How we built a game for The Google Assistant

by Peter Nann
 | 
19 December 2017
How we built a game for The Google Assistant
Salmat’s voice expert Peter Nann recounts the process of scripting his game for the Google Assistant, Australia Says.

It’s not just what you say, it’s how you say it. We learn to talk as babies. We learn the art of conversation through listening and repeating. The science and art behind what makes a good conversation is not something we consider on a day to day basis. However, this is the key to building a great voice experience.

Salmat is a marketing services business. Our team of experts has more than two decades of experience in VUI (Voice User Interface) development, from our work creating natural language Interactive Voice Response (IVR) solutions for many of our clients. 

On a daily basis, we could be listening to thousands of customer voice “utterances” to understand how real Australians behave and speak to voice automation; or making the technology deal with our local idiosyncrasies (We say "double two", Americans do not!); or determining just the right way to ask a question to get the best response. We have now applied learnings from this work to the voice device space with the creation of our memory game, Australia Says.

Australia Says is Salmat’s first app for the Google Assistant. Users are asked to repeat a string of Aussie slang words in a specific order and the person who remembers the most, gets all the glory. It was one of the first local games released on the Google Assistant and can be played on both a Google Home device or via the Google Assistant on eligible Android and iOS phones.

But, how would the skills we’d learnt from decades in the natural language IVR space translate to building voice assistant apps? And could we create one in time to be amongst the first to launch on the Google Assistant platform in Australia?


Creating a persona

Before we even began the technical build, we did what we always do at the start of a new project and chose a persona for the app. Understanding and defining this persona – a true-blue, Aussie male – meant we could build out the lexicon of words and phrases to be used throughout the app. 

We then scripted the conversations we wanted to build between the Google Assistant and the user. We think of this process like writing a script for a play. What does the Google Assistant say, what do we expect the user to say in response?

In the case of Australia Says, a large focus was placed on the messaging used when the player makes a mistake, which signals the end of a turn at the game. Sledging is an Aussie pastime so ensuring we had a broad spectrum of slightly tongue-in-cheek and entertaining phrases to use when someone got something wrong was determined to be important to the success of the game. One very entertaining brainstorming session involved explaining “see you round like a rissole” to a German member of staff.

We gathered a considerable list of true-blue Aussie words and phrases. However, the tone was kept fun and genial (in line with our persona) to ensure that it feels like everyone is laughing together. 

The next revolution is voice


Building a conversation

The scripting of a conversation for a voice app is often broken into two parts. On the one side, there is the “happy path” whereby the user does exactly what you want them to do. The likelihood of this occurring is estimated to be as low as 20%. Therefore, an integral component of the scripting process, and typically 80% of the dialogue design effort, is to consider the “Error paths”.

Our game itself didn't require complex error paths, but we did have the opportunity to clearly differentiate “happy” versus “unhappy” outcomes, and of course make it a bit of fun:

Happy Path:

Google Assistant: Repeat after me: crikey spewin bonza
Player: Crikey spewin bonza
Google Assistant: Hoonin'! Now repeat: crikey spewin bonza spewin
Player: Crikey spewin bonza spewin
Google Assistant: Bloody ripper! Now repeat: crikey spewin bonza spewin ...
 

Unhappy path:

Player: Crikey spewin bonza oh I can't remember the rest.
Google Assistant: “You made a dog’s breakfast of that. You missed a 'spewin' in there somewhere. Better luck next time!"

Or

Unhappy path:

Google Assistant: “ ... Now repeat: crikey spewin bonza crikey bonza crikey"
Player: <No response>  A common behaviour when confused or unsure
Google Assistant: “Lights are on but no-one's home. Leave ya money on the fridge. See ya next time.”
 

Grice’s Maxims

Another VUI design principle is to follow Grice’s Maxims for cooperative conversation when building out your app scripts. The basic theory being that humans inherently assume an undercurrent of cooperation in their conversations. He defines his four principles as follows:

Grice's Maxims:

  • Quality: Only say things that are true
  • Quantity: Don’t be more or less informative than needed
  • Relevance: Only say things relevant to the topic
  • Manner: Be brief, get to the point, and avoid ambiguity and obscurity

Although these principles are not on clear display in our initial game, they can generally guide both prompt design and expected user behaviors. Error paths, for example, are critical. It's highly desirable to quickly detect that there is a problem, and get the user back on a “happy path” as quickly and painlessly as possible. This is the biggest challenge of VUI design.

With our dialogue design drafted, and a good feeling for the style of the game, deciding on the name for the app was another important step when building the conversation as this would be used to invoke your app. 

Example

Ok Google, talk to Australia Says …

The name needs to be easy for the user to say and easy for the Google Assistant to understand. When choosing your app name, it’s worth referring to Google’s Invocation and discovery checklist, which breaks down the requirements of a good app name into four components:

4 rules to follow when picking an app name:

  • Avoid words that have multiple pronunciations

  • Make sure Google recognises your app name

  • Choose easy, but unique names

  • Adhere to the Google name policies

Out of leftfield

Consider the unexpected when running through your scripted conversations. Testing these with people unfamiliar with the app is a worthwhile practice as you can start to discover any leftfield questions users may ask that you have not accounted for in your script. The art of natural conversation is flow. So, accommodating these more unusual queries or comments into your script will only enhance the experience for the user. 

For example, consider what happens if the Google Assistant doesn’t understand the user or the user asks something the app doesn’t support.

One of the key elements of conversation scripting for us was defining the series of phrases used to indicate that the user was correct (aka confirmations). These communicate to the user that the system understands what they have said and encourages them to keep playing.

It was important for us to create a large list of these confirmation phrases to avoid boring the user with monotony. We randomise these phrases in the system, although higher scores can elicit some rarer, more exciting versions. A similar approach was taken for “welcome back” phrases when a user repeats the game quickly after their last attempt.

 


The technical build

The scripting stage should not be underestimated. Get this right and you save yourself considerable time during the technical build, as the developers know exactly what they have to build from the start. 

In our case we decided to record a professional voice talent to get the quintessentially Australian persona we desired, so the script needed to be complete and accurate before we spent the hours in a studio recording over 500 prompts. 

Our experience with this sort of process meant it was a great success, and we 'got it right the first time', meaning we got everything we needed for the game in the first studio session – an impressive feat if you know anything about this sort of work.

However, even prior to prompts being recorded, selected and processed, we were able to start the technical build. The initial build used the default Australian voice available on the Google Assistant to speak all the prompts. Working this way meant we were well into the build by the time the voice artist’s recordings were added. In fact we still have the option to switch between the recorded prompts, or any of the English voice options at any time, via some special software we built into the app.


Fine tuning

Hours were spent reviewing and picking the chosen recordings for the app. We constantly asked ourselves, does the artist sound Aussie enough? Was he using the correct inflections in his phrasing? Was the tone consistent? Will this sentence blend well with the preceding and following ones? Are the sentence gaps natural? 

Once the final recordings were chosen, they were uploaded into the app, and we were all quite happy with the result. It's Aussie-as, ay?

In fact, it was so Aussie that we weren't sure whether the language, or the 'sledging' angle, would be understood and acceptable overseas, so the application was 'geo-locked' so that it is only available in Australia, and to our friends across the Tasman in New Zealand. We are still considering whether our American friends could, in fact, cope. Have a play, and tell us what you think?

Building a truly fluid conversation is a challenge. Humans are, after all, unpredictable. Therefore, testing the app is an important element in the further improvement of conversational flow. We need to continually teach the app to build its repertoire of conversation and understanding – something we will focus on in our next blog.

This blog is part of a three-part series where we share our experiences of building an app for the Google Assistant. In the final installation, we talk about how to continually improve your app. Check out the first blog in the series here: Learn to speak Aussie with Australia Says.

Page 1 of 2
  < 1 - 2  > 
About the author
Peter Nann
Tech Lead-Speech & Automation, Salmat

Peter has been living and breathing VUI (Voice UI) development for 23 years, from the earliest commercial forays in this space in Australia in the 1990's, waiting all that time for the future to finally arrive.  Over the years he has worked with many organisations in Australia and abroad realising speech recognition solutions that service multiple millions of real-world user contacts every year.

More articles by Peter Nann