When we recently tested the usability of intelligent assistants like Siri, Alexa, and Google Assistant, we found that interactions with these agents are plagued with problems, ranging from poor comprehension of commands to inherent limitations in verbal output.
Yet despite these problems, voice-based assistants are becoming increasingly popular. 46% of U.S. adults reported using voice-controlled digital assistants in 2017, according to the Pew Research Center. And when we recently asked 211 daily users of an intelligent assistant to recall the last time when they interacted with an assistant, most described successful experiences, and many positively gushed with enthusiasm about how great their assistant was:
“I speak to Google Assistant on a regular basis, I use it every day, all day. And I believe it was one of the best things that was created. I just love it, I just can’t live without Google Assistant.”
“Siri is definitely very useful. I use it for everything daily multiple times. I use it probably 30 times a day. Everything I do is basically through Siri and I couldn’t be happier with it.”
“I love my Alexa.”
Clearly the usability problems we observed are not deterring people from using intelligent assistants. To understand how users can have such positive reactions despite the poor usability of these systems, we looked at high-frequency users of voice assistants and the tasks they normally perform with their help.
Current Usage of Intelligent Assistants: User Research
We asked people interested in participating in a study about intelligent assistants to answer a few questions about their use of such interfaces. Out of the 464 users who responded to our call, 211 were daily users of either Siri (72), Google Assistant (57 on their phone and 22 on their smart speaker), or Alexa (60).
In this article, we focus on our respondents’ answers to a critical-incident question: Tell us about the last time you used [your intelligent assistant]. What were you trying to do? Was it successful? Users uploaded a video of themselves answering this question. Besides answering the actual question, many offered other comments about how they generally used the assistants.
Types of Tasks
Despite the extravagant descriptions of agents as a ‘butler’ or a ‘best friend,’ most frequent users do not use intelligent assistants to complete everything a human assistant could do. Instead, they selectively assign certain types of tasks to their assistant.
The single most common use people reported was simple information retrieval — trivia, word meanings, or facts such as measurement conversions, sports statistics, and geography. The next most common uses were checking the weather and communicating with a person (by making a call, texting, or emailing).
A notable use of the assistants was for voice control of smart-home appliances or devices — light switches, TV systems, thermostats, or door locks with “Internet of Things” connectivity. About 9% of users reported one such activity (categorized under IOT control).
A few less common tasks included getting an idea for a recipe (categorized as Idea), closing an application on the phone or controlling the phone volume (categorized as Phone control), and playing games (especially with Alexa).
The reasons people gave for liking (or disliking) their assistant both support the importance of usability, and also explain why many users value the assistant so highly despite current usability limitations. These reasons ranged from good (or bad) voice recognition, good (or bad) result accuracy, and efficiency (compared with typing). But the ability to interact hands-free (mostly while driving) was by far the most frequently mentioned benefit of using a voice assistant, with a total of 35% of daily users noting it. The high value of hands-free operation suggests that there is currently a very low bar for how good a voice assistant needs to be: it doesn’t actually need good usability, it just has to be less unpleasant than getting a traffic ticket or having a car accident. (It’s highly doubtful whether the cognitive load imposed by the poor current usability actually reduces driver distraction and thus prevents accidents. But for people to like a UI, it’s enough that they believe it’s safe, even if it’s actually dangerous.)
Essentially, right now, people use intelligent assistants only for the easiest tasks, mostly when their hands are busy. And, they often repeat these tasks, checking the weather every morning, or using music commands several times a day — meaning that these simple tasks make up a great percentage of their overall interaction with (and impressions about) the voice assistant.
Low Complexity for the Frequent Tasks
Elsewhere we identified five characteristics of intelligent agents that hold promise for this new interaction style. They were: voice input, natural language, voice output, intelligent interpretation, and agency. Our usability study indicated that voice assistants today are pretty far away from doing a decent job on those dimensions.
The most common assistant tasks all use voice input, but they largely bypass the other requirements because they are comprised of only a short series of highly predictable commands and steps.
For example, current voice assistants are good at telling you the weather forecast for your current location, which is what most people want to check on a daily basis. However, even slightly less predictable variations on this task, such as ‘what’s the weather in London in the fall’ or ‘What’s the weather at the Statue of Liberty on Friday’ fail on Siri and Echo. (Getting directions does include multiple steps and rich information, but this task benefits from decades of preexisting investment in optimizing navigation guidance. And getting directions is still fairly constricted: today’s assistants are not able, for example, to start directions at a given time or to account for driving in a carpool lane.)
The number of steps required for a task is a major determinant of how likely a voice assistant is to complete it successfully. Depending on the complexity, tasks can be grouped in four categories:
- Simple actions require one step or simple action to complete. Examples include turning up brightness, setting a timer.
- Multistep tasks are similar to an interaction flow on a website or in an app; they require going through several stages to complete a process. Examples include calling an Uber or placing an ecommerce order (if you already know what you want to buy).
- Multitask activities involve the use of several activities and applications to achieve a goal. An example is creating a list of phone numbers for people whose emails you have not read.
- Research activities require putting together multiple sources of information and analyzing options. For example, finding the best hotel options in a city based on a set of criteria is a research activity.
People mostly ask their agents to do tasks with only one step; 26% of our participants used a voice assistant for tasks with multiple steps, but these were primarily getting directions. Other multistep tasks, or more complex jobs which combine several tasks or require open-ended research, were vanishingly rare. People don’t even try to use voice assistants for these needs. And there was no report of a research activity done with an intelligent assistant.
Knowledge Required for Tasks
A key feature of intelligent assistants is their ability to infer user goals and understand context. This activity requires knowledge about the world and also knowledge about users. We analyzed the tasks that our participants reported according to the types of information they required.
The majority of the tasks involved content available freely on the web. The next most common piece of information needed to complete the tasks was the user’s current location; other types of personal information (contacts, calendar) were also important. About 31% of the tasks required no information at all. Only 1% of the tasks involved more sophisticated types of knowledge such as the user’s prior interactions with the system (e.g., retrieving a parking location, ordering the usual laundry detergent, or changing an Amazon order).
The usability challenges of intelligent assistants are real and pervasive. Users are not unaffected by them — they simply avoid usability agony by limiting their use to a subset of simple features which are least impacted by poor language comprehension, lack of access to complex personalized information, or lack of true intelligence.
This is in close analogy to the state of the early web: In 2000, the success rate when using a new website was 61%, whereas in 2010 it was 78%. This was a rapid rate of improvement, compared to other areas of human endeavor. Still, in 2000, people would fail 39% of the time when trying something new on the web. As a result, users spent most of their time on familiar sites with above-average design, where they would enjoy a much higher chance of success than if they ventured out on the open web. The web as a whole was pretty bad in 2000, but the web user experience, as actually experienced by each individual user, was much better, because the percentage of any given user’s tasks that was attempted on a new site was very low. Low usability caused people to stay on known turf and rarely stray.
Right now, the benefit of being able to use a device ‘hands-free’ outweighs the annoyance of poor usability. Even a barely usable voice-based assistant may still be faster than pulling over while driving, or washing food off your hands in order to use a touchscreen. But, as these agents evolve, usability will increasingly become a competitive advantage — especially if the assistants become device-agnostic (you can already use Google Assistant on an iPhone). As was seen with the iPhone when it was introduced back in 2007, when given a choice, people will flock to the system that solves usability problems.
One big risk of mediocre yet popular assistants is that they shape people’s mental models and expectations. Right now, people are learning that ‘intelligent’ assistants are actually not that smart, and they will likely base their future expectations and usage on these formative experiences. If assistants do get smarter and more capable, these previous user experiences are likely to discourage people from even trying to use advanced functions.