Voice-first technology: The future of user interface for controlling building systems

By Jeff Carpenter 

Every building system in today’s modern facilities has computerized technology behind it. Interacting with that computer is done through a user interface. The art and science behind creating those user interfaces is called user experience design, or UX for short. Companies spend significant amounts of R&D money on improving the usability of their products, and UX is one important focus of that effort. 

Some of the ways human beings interact with technology in buildings include: 

  • Infotainment and information display systems: digital signage, event management and communication, television content 
  • Wayfinding and transportation: elevator control, secured parking, electronic wayfinding 
  • Collaboration systems: conference room audio/video systems, huddle spaces, web conferencing, video conferencing, classrooms, conference room usage and scheduling 
  • Environmental control: HVAC, lighting control, power distribution and monitoring, utility metering 

Users are not always satisfied with their interactions with these systems, however. 

Perhaps the most complaint-riddled system in the examples above is audio/video. A/V equipment is found in all types of building spaces: personal offices, huddle spaces, classrooms, conference rooms, etc. A common source of complaints with this equipment is the interface used to control the equipment.  Touchscreen controls are viewed as complicated and confusing. Push button controls often are placed on walls to eliminate the “confusing touchscreen,” but push buttons have very limited control options. 

Another example is my recent experience with the HVAC thermostats in a newly built hotel. Technology is part of my daily life personally and professionally, yet I could not figure out this brand of thermostat. I could not get the room to the temperature that I wanted it to be. The user interface was horrible. 

What’s the solution? Do we continue to refine and improve the current user interfaces incrementally? Or is there another solution? 

Once again, the enterprise soon will be influenced by innovation in the consumer electronics industry. Voice-first interfaces, or voice user interfaces (VUI) could very well be the future. 

A new day for VUI 

VUI is self-explanatory – it uses human speech as the main method of interacting and controlling computers. VUI is not new, however, and impressions of the technology often are negative. For example, automobiles have had voice recognition capability for many model years. But how many times have you tried to talk to your car only to hear it reply, “Did you mean (something that is nothing close to what you said)?”  

Clearly, early attempts at VUI have not been impressive. However, technology improves exponentially, and several new advanced technologies point to a new day for VUI. 

Legacy voice interfaces rely on a fixed set of structured commands that require the user to know, understand, and specifically state the command. The computer is programmed to understand and do something in response to the utterance of one of these structured commands. For example, the system may be programmed to understand the spoken command, “Tune to 102.5 FM,” but will fail if the user says, “Play 102.5 on the radio.”  Thankfully, these types of VUI are largely relics of the past – and rightfully so. They often led to more frustration than value. 

The revolutionary improvements in VUI come from two main areas of science: natural language processing (NPL) and machine learning. 

In layman’s terms, NPL is a science that seeks to understand intent in the content of speech and turn that into action, instead of requiring the user to speak a predetermined command phrase. As an example, you can see NPL in action by using smartphone navigation with Google Maps. The results of any of the spoken commands below (and more) will result in Google Maps calculating a route to your house. 

  • Navigate home.” 
  • Navigate to my house.” 
  • Travel home.” 
  • Take me home.” 

Learning from Big Data 

Machine learning – a broad and complicated area of computer science – is an essential component of NPL. A crude summary of machine learning is the analysis of increasingly vast amounts of data (Big Data) to “teach” the computer to predict results from that data and anticipate desired actions without being specifically programmed for that specific scenario. 

Yet again, Google Maps provides a relatable example of the combination of NPL and machine learning. A frequent user of Google Maps will eventually notice proactive destination prediction notifications within the app. Rather than the user initiating the navigation command to “travel home” at the end of the day, machine learning results in the application’s anticipatory prompt, “Are you going home?” This activates the VUI and allows the user to simply say yes or no. 

Consumers already can experience the rapid progression of the combined power of voice-first interfaces, natural language processing, and machine learning in two very inexpensive consumer electronics devices: Amazon Echo and Google Home. These devices, and the integrations they already provide to other systems, provide a glimpse of the future for less than $200 each. Those products will continue to evolve with additional capability over time. 

Imagine the potential in our buildings as this technology works its way into the commercial sector: 

  • “Activate my video conference preset.” 
  • “Play a DVD on the projector and all three displays.” 
  • “It’s getting warm in here.” 
  • “I’m leaving for the day.” 
  • “Tell my nurse that I’m getting hungry and my pain is returning.” 

Voice-first interfaces are the future and the future is not very far away. The possibilities are limited only by our imaginations. 

Categories: Articles | Technology