As more and more devices around us sprout microphones and "smart" assistant software that listens for commands, various problems are emerging. Much attention is lavished on the Big Brother aspects of what amounts to always-on ambient surveillance, and that is indeed a development that is worth examining. However, today I would like to focus on another aspect of voice-controlled user interfaces: when a system has no easy way of telling you what its capabilities are – how do you know what to ask it?
The answer to this question entails discoverability, and I would like to illustrate this somewhat abstract concept with a picture of a tap. This particular tap lives in my employers’ newly refurbished London office, and I challenge you to work out how to get sparkling water from it.
The answer is that you press both taps – and now that I’ve told you, you may perhaps notice the pattern of bubbles along the bottom of the two taps. However, without the hint, I doubt you would ever have worked it out.
Siri, Alexa, Cortana1, and their ilk suffer from the same problem – which is why most people tend to use them for the same scant handful of tasks: setting timers, creating reminders, and playing music. Some users are willing to experiment with asking them to do various things, but most of us have enough going on in our lives that we can’t take the time to talk to very stupid robots unless we have a reasonable certainty of our requests being understood and acted upon.
Worse, even as existing capabilities improve and new ones are added, users generally stick to their first impressions. If they tried something a couple of years ago and it didn’t work then, as far as they’re concerned it doesn’t work, even if that particular capability has been added in the meantime.
It’s only a power user if it’s from the Puissance region of France. Otherwise, it’s just a sparkling prosumer.— Dominic 🇪🇺 (@dwellington) July 16, 2019
I generally find out about new Siri features from Apple-centric blogs or podcasts, but that’s only because I’m the sort of person who goes looking for that kind of thing. I use Siri a fair amount, especially while driving, although AirPods have made me somewhat more willing to speak commands into thin air, so I do actually take advantage of new features and improved recognition. For most people, though, Siri remains the butt of jokes, no matter how much effort Apple puts into it.
This is not a competitive issue, either; almost everyone I know with an Alexa just treats it as a radio, never using any other skills beyond the first week or so of ownership.
The problem is discoverability: short of Siri or Alexa interrupting you ("excuse me, have you heard the good news?"), there isn’t any way for users to know what they can do.
This is why I am extremely sceptical of the claims that voice assistants are the next frontier. Even beyond the particular issues of people in an open-plan office all shouting at their phones, and assuming perfect recognition by the AIs2 themselves, voice is an extremely low-bandwidth channel. If my hands and eyes are available, those are far better input and output channels than voice can ever be. Plus, graphical user interfaces are far better able to guide users to discover their capabilities, without degenerating into phone menu trees.
Otherwise, you have to rely on the sorts of power users who really want sparkling water and are willing to spend some time and effort on figuring out how to get it. Meanwhile, everyone else is going to moan and gripe, or bypass the tap entirely and head for the bottled water.