Voice UI: what is it and what does it mean to interact with it?

By Antonio Holguin, Design Director

Voice User Interfaces (VUI) are the next generation of human computer interaction. VUIs allow people to use the power of their voice to interact with computers, instead of using their hands with a mouse, keyboard, or touch screen. This sans-hands method of interacting with a computer has near unlimited potential. A lot of work is being done for voice recognition and interfacing, including using machine learning to recognize and compute an individual’s specific speech patterns over time. The driving technologies behind VUIs are some degree of artificial intelligence (AI). Cloud computing, machine learning, data collection, and processing, are all combining to become powerful artificial intelligences that will feature VUIs as an indispensable (but possibly not sole) mode of communication with computers. The primary goal of all AI is to create a more human-like computer or to augment human thought processing power.

Creating a more human-like computer forces us as designers to consider what it means to be human, how we interact with other humans, and thus how to build computer interactions that feel more human. The screen/mouse/keyboard set-up makes current computers feel like a lifeless entity, cold and sterile. Adding voice to any computer gives the sense of life, animation, to an inanimate machine.

Now, with voice, we must think about how verbal conversations sound, feel, and flow. Apple’s Siri, Google’s Assistant, Amazon’s Alexa, and Microsoft’s Cortana are all prime examples of consumer level AI that can respond to a request, control some level of physical devices, help give options based on internet searches, and more. IBM’s Watson, a business to business solution, takes AI to another level by adding the ability to make predictions, assumptions, and even some reasoning to computational outcomes. As computers become “smarter,” more human-like, the way we communicate with them should reflect the way we communicate with other humans – our grammar, formality, inflection, cadence, and so forth, should all be taken into account. Using a VUI should feel as natural as speaking, and listening, to any other human.

 

Where have we experienced VUI previously?

We’ve all seen, at the very least, HAL 9000 on 2001: A Space Odyssey, which debuted in 1968, nearly 50 years ago. VUIs have been seen in countless science fiction films, shows, and books in many forms. Captains and crews of spaceships call on VUIs to give them updates on ship status, or as managers of wireless, inter-human communication. Speaking to a Nexus-6 model android in Do Androids Dream of Electric Sheep (Phillip K Dick, 1968), for example, is voice UI. VUI in sci-fi is so common that we often forget that characters are speaking to computers, even as we watch them do so. In sci-fi, VUIs are the primary mode of interactivity with robust artificial intelligences and robotics.

VUIs in the real world, most notably, have been found on customer service telephone hotlines for decades. Apple’s Siri, Google Assistant, and Microsoft Cortana are on hundreds of thousands of smart phones in the US alone. Siri and Cortana have made their ways into platform computers. Amazon’s Alexa service is growing rapidly as Echo and other Alexa enabled devices boom. IBM’s Watson expands into B2B, making it a powerful digital business partner. My 2008 Ford Focus incorporated Microsoft Sync, which had a VUI component: I could plug in my phone, make calls, send texts, and change music, all with voice. Even Xfinity’s remote control offers a level of vocal user input, but without the audio response.

 

How is VUI valuable to the digital industry?

The velocity of daily voice enabled device use is only going to increase. Voice interaction will soon be as ubiquitous as the mouse and keyboard. VUIs give us a whole new paradigm to interact with computers and software, and thus, a totally new way to design human computer interactions. This opens the doors to create more unique, and potentially impactful, experiences across a wide array of hardware implementations, increasing the reach of products that utilize this technology. The digital industry will need a new understanding of how to design multi-modal interactive products – multi-screens, as well as voice.

With new interaction models comes new design considerations. A short list of some of the possible considerations include:

  • What will accessibility rules be?
  • How do we design VUI for those without use of their voice?
  • What are the current limits to the available services (Google Assistant, Apple’s Siri, Amazon’s Alexa, or any other)?
  • Can we create cross-platform, cross-device, multi-session, cloud based experiences?
  • Unless agencies create their own VUI service, we will be at the mercy of the services’ capabilities. Speech patterns, extensions, user accents, impediments, etc, will all be dependent on the services’ most current publicly available capabilities.
  • How does voice replace, or more appropriately, augment physical/visual interface design?
  • How does voice interaction play into larger ecosystems and industries, such as healthcare, aviation, etc?
  • How does voice interaction work with machine learning, data collection and aggregation, neural networking, and all other elements of AI? How will these all be used separately or in tandem?

We’ll need to design more abstractly – out of the context of linear flows – yet, be able to capture minute nuances of language. Conversational AI design will need to be handled with both hyper-precision and inclusive ambiguity – users will have very specific phrases to say, yet many different and undefined ways to say them.

In order to stay competitive, our collective clients will soon be asking for a VUI component or project. Whomever begins to solve the above challenges soon, and builds their skills around VUI, will lead multi-modal UI design. Understanding language and conversation will be a key addition to our design skills. With the rise of Alexa, Google Assistant, Siri, Cortana, and Watson, VUI becoming a necessary integration into our projects is not a question of if, but when. We need to be ready.

 

The Future of AI?

Over the next few years, VUI will become a predominant mode of interaction with computers in our daily lives. Smart home devices (including security and HVAC systems) are already becoming common place. Intuitive chat bots, autonomous vehicles, and data mining software are all burgeoning on a boom into mass population. The number of useful artificial intelligences is only going to increase. What was once science fiction will become technological reality. Getting to a fully autonomous AI, like HAL 9000, will be limited by three things;

  1. The public acceptance of AI in everyday life
  2. Pushing the boundary of machine-based sentience
  3. The reduction of fear of what ramifications come from those both

Before we arrive at sentient, autonomous artificial intelligence, the technology industry will need to take a few steps. First, and currently, AI is simply augmenting human brain power: collecting data, organizing and compiling information, then handing that to a human to make decisions.  As humans become accustomed to computers collecting and compiling information, we will allow them the power to give, and rationalize, options instead of simply handing the information over. Software that can rationalize options most in line with human decisions, will then be afforded the ability to make simple decisions that will be verified by human control. Machine dependence on human authority will dwindle, leaving machines fully capable to make rational, independent decisions: free thought.

This might sound scary. The first conclusion many people come to when thinking about autonomous AI is a Terminator or Matrix scenario, where computers will attempt to eliminate their human oppressors. But I see human-AI interaction becoming much more symbiotic. Humans and artificial intelligences will need to depend on one another in order to survive. We are all limited to habiting this very fragile Earth. In order to maintain our survival as a human species, and to care for our ailing planet, we will depend on our robotic counterparts – and they will depend on us getting them there. Maybe in the process, we’ll become friends.

Related Posts

Hi! Let's stay in touch.
Sign-up for our newsletter!