Your one stop guide to for voice based interfaces.

Over the last couple of years, Voice User Interfaces (VUI) have acquired the status of critical mass. What this means is that it has reached a point at which it can continue to grow sustainably and can quickly acquire a larger audience. It also means that it may soon become a basic part of our lives. You know, like mobile phones and visual interfaces.

And although the current state of VUI is far from optimal, it is quite promising. The interactions aren’t conversational yet and this poses a problem because it means that users will have to adapt to the technology.

And as and Product Designers, we’ll have to account for that.

If you ask me, that is quite a task. Especially since this is still a relatively new domain.

Now we may already design quite efficiently for tap-to-talk — like Google Assistant and Siri, but that’s quite different from designing for a hands free experience. The challenges are numerous and the more we progress with tech, the more complex and interesting the problems will get.

So to break it down, here are a couple of pointers to keep in mind while designing a VUI:

Representation Matters

“Amazon Echo dot” by Andres Urena on Unsplash

With the current state of tech, voice interfaces are super handy. But they aren’t flawless. And they certainly wouldn’t pass the Turing Test. Which is to say that it isn’t capable of sounding or appearing as intelligent as a human being.

So if you made the physical product represent a human being or a humanoid even, you’re raising your users expectations. I mean think about it, the Echo Dot looks like a dot, not humanoid. It isn’t even anthropomorphic. It looks like a device and so people expect it to function like a device — on a subconscious level, they know that they need to put in that extra bit of effort to communicate their needs.

So, What About Branding?

As designers, this is super important. I mean your brand can now sound like Snoop Dogg or like Adele. And it’s not just their voices, its their personalities too. You don’t want a meditation app (or “skill” as Amazon calls it) to sound like Simon Cowell or Gordan Ramsay.

I think that this opens up a wide range of oppurtunities for artists and celebrities, you could sell your voice to a company, it would take being a brand ambassador to a whole new level and the possibilities truly are endless.


Remember when the iPhone first came out?

I don’t. I was a happy oblivious kid back then. But, that’s not the point.

One of my professors favourite stories to tell is about how when the iPhone first came out, kids would use their parents phones and purchase all kinds of things with the help of the ever useful AutoFill. Parents would then receive hefty bills every now and then and shake with rage, feeling duped and frustrated.

We can’t have a repeat of that with voice control right? I mean sure, down the line it’s gonna be super easy to purchase things with voice based interfaces. Or even access confidential information. But, it needs to have an extra level of security — perhaps a password, or a fingerprint scanner. Or maybe even face ID.

A Short Note on Lists

Y’know how on a visual interface you can quickly scroll through a list of options without getting super over whelmed? Or maybe you never noticed — because I mean it is such an intuitive process that you don’t really have to notice it.

Well you can’t do that with voice. You can’t mindlessly list out options to a user and ask them to choose. They’ll forget and they’ll be quite annoyed.

If you’re having trouble picturing this, think about the last time you had to navigate a voice activated customer service phone menu. Do you remember all that time you had to spend patiently — and then furiously, hitting the buttons only to be faced with a dead end?

Yeah, that wasn’t fun. So now, with voice interfaces, we need to make sure the interface doesn’t just slowly and mechanically list out options to a user. The process has to be conversational.

Here’s an example by Jason Amunwa on what you could do instead:

“Hey, Jason, where’s a good place to go for sushi?”

“There are several sushi restaurants in the area — would you like to walk, or drive?”

“It’s a nice day, I’m down to walk”

“Ok, Emperor Sushi is a 2 minute walk from here,
but if you want something cheaper, Ninja Sushi Deli is a 5-minute drive.”

“Good to know — let’s do Emperor Sushi today.”

Awkward Silences

Don’t forget to keep the user engaged.

Once a user is done delegating a task, don’t keep them them hanging while the device is . They’re going to get confused if their words are met with silence.

You can combat this by filling in the gaps like this:

“Book a table for two this Saturday at Vinny’s at 7 P.M.”

“Hold on, I’m booking the table. Meanwhile, would you like to add this event to your calendar?”

“Yes I would.”

“Alright! I’ve booked a table at Vinny’s for this Saturday at 7 P.M and I have added the event to your calendar.”

The opposite of this would look like:

“Book a table for two this Saturday at Vinny’s at 7 P.M.”



I’ve booked a table at Vinny’s for this Saturday at 7 P.M.”

Accounting for Errors

Image result for comic voice interfaces

As we’ve discussed before, with the current state of tech, these interactions are not conversational. And sometimes, the system may have some trouble understanding the user.

So we need to design keeping this in mind. We need to develop strategies to ask questions without seeming too stupid. And we need to use analytics and learn from them to either eliminate errors or making our error strategy optimal.

Task Completion

With visual UI, one of the key screens is the task completion screen. There’s no reason this should be any different right? People still need to know when they’re done completing a task. It acts as feedback and reinforces the conversational aspect of VUIs.

Source link—-819cc2aaeee0—4


Please enter your comment!
Please enter your name here