Voice User Interface (VUI) is everywhere today, from voice assistant such as Siri, Bixby in your smartphone, to Google home and Amazon Alexa that could control almost all your smart devices at home. It seems that smart voice assistant, such as “Jarvis” from the movie “Iron Man”, “Samantha” from the film “”Her’’, or any intelligent virtual assistant from fantasy novels that you might read years ago, will no longer be just a fantasy.
But, is it true?
Before starting dig into the discussion about the viability of such good wish, let’s begin with the explanation of some terms.
What is Information Architecture (IA)?
There’s no single definition that can explain all the aspects of IA. The Information Architecture Institute defines IA as “ helping people understand their surroundings and find what they’re looking for, in the real world as well as online”. In other words, it’s the organization of the website, app, software and other medias that allow users to find what they want easily.
What is Voice User Interface (VUI)?
Unlike the Graphic User Interface (GUI), the one dominants our digital world currently, which provides a clear organization of information through visual components, VUI understands and completes users’ commands through voice recognition and artificial intelligence (AI). While the organization of information in VUI is equal, or even more important, how it works for user is different than what GUI does.
That is to say, without visual assistant, how VUI could help users to accomplish tasks could be harder than what you expected. Here are some problems that may drive a user away from VUI.
1.Accuracy, accuracy, accuracy
We have to admit that voice-activated-system has developed a lot recently, and its accuracy has grown over 90% as reported. However, simply understanding certain word or phrase doesn’t mean it can work as user expected.
I still remember the time when I told Bixby to open an app called “Soul” in my phone, it consistently typed on screen as “so app so sooooo app” as I was yelling to the phone. They do include a feature called “teach me”, where you could manually correct the word you are actually saying rather than the ones misheard, but, what’s the next? The phone will either tell you “I didn’t find it”, or show you the results she found online, wasting your time to do something that you could’ve done easily by yourself.
The accuracy isn’t just a simple voice-recognition problem, but more about the machine learning and how the AI will adapt itself as interacting with users. It is not impossible, but we still need some time to have the technology grows.
2. No one knows what it can do
Unlike GUI, where users can directly see and navigate to find all the functions of the device, VUI is less tangible, and therefore, less understandable. According to a study from Microsoft Research U.K., most users don’t know how the voice assistant is supposed to function and what it can actually do, leaving their expectations floating aimlessly.
A clear guide aims to help users to build a mental model of voice assistant could be considered.
3. Users are still doing what they can do with GUI
As a result of lacking of mental model for VUI, as well as the limit of current voice-assistant-system technique, most users are still using VUI to do some tasks that they could already get done easily with GUI, such as setting an alarm, searching online, listening to the music, etc.
It might be slightly convenient to have all these tasks done without going through all the clicks on your own; however, time, money, and effort invested in the voice-activate-system shouldn’t be used just to accomplish those simple tasks to save 5 seconds in the users’ life.
*Of course, there’s one exception for this: Voice assistant is a great tool for accessibility. People with disabilities would find these voice recognition functions rather helpful and effective. But might not for all the other users.
A product should have its unique features that can solve users’ problems in a better manner. Can current voice assistant do something differently in a more effective way for all its potential users? Possibly, but I don’t’ know yet.
It’s often not simply the voice-recognition technology that gives rise to the terrible experience of voice devices, but the combination of other technology such as machine learning and the design of interface that gives a less-than-desirable result.
It is important for designers to design based on UX principles and user needs.