“Speak” And Thou Shall Find: The Future Of Voice Recognition

Geetika Tripathi

It’s Monday morning, and the alarm clock starts beeping. You try to reach out and find where it is. Suddenly, you gather your senses and mutter the command  “Shut up.” It goes quiet. Then, you request your daily dose of news capsule, weather updates, and stock information while you instruct your coffee machine to brew that perfect mocha for you.

As the fridge notifies you of its depleting supplies, you briskly command the shopping list to be updated. All this is done as you order the vacuum cleaner in the next room to spot-clean a grubby area of the carpet.

As you leave home, the lights and devices are turned off at your command. Once you sit in your car, you ask it to choose the best route possible based on the traffic update, then remember you need to send an important email before you arrive at the office. You dictate the subject and text and send it off – all in time to enjoy your favorite music for the rest of the journey.

This is not a hypothetical scene from the future. In fact, each one of these smart gadgets is already available. Examples include Ivee, Echo, Vocca light, Listnr, Smart Kitchen assistant, Buddyguard, Dragon Drive, and many more.

Voice and gesture control has indeed come a long way!

How did it all begin?

Let us go back to 1936, when it all started. Bell Laboratory’s “Voder” was the first trial of electronic speech synthesis. Then in 1952 came “Audrey,” a relatively improved speech recognition software, by Bell labs. This was followed by the “Shoebox,” a 1962 IBM creation that could make sense of a string of 16 words. Since then, a lot of research and experimentation has happened in the field of voice control and recognition, begetting various technologies and systems such as speech understanding research (SUR), Harpy, Hidden Markov Model (HMM), DARPA projects, Dragon, and VAL to name a few.

With the path and relevance not being clear, there seemed to be a hiatus in the further development of speech technology. However, Google’s voice search app brought it to the forefront. And as mobility and cloud unleashed a renewed vigor into this technology, Siri was born to bring enhanced accuracy, personality, and entertainment to the world of speech recognition.

Siri, Google, and dollops of artificial intelligence

Although the hot phrase “Okay Google” was dropped from Google’s voice search activation due to security and other reasons, the concept is still alive. The new mechanism for talking to your devices and getting a valuable and quick response has caught the world’s attention.

When Siri (the acronym of Speech Interpretation and Recognition Interface) debuted in 2011, it brought along incredible features and characteristics such as artificial intelligence, knowledge navigation, personalized assistance, unmeasurable vocabulary, and natural language comprehension blended with a fair dose of fun. You can ask this loyal friend anything – ranging from dialing a number to rolling a die to booking a movie ticket to answering the strangest questions imaginable.

What does the future hold?

Have we already reached the zenith of voice control and speech recognition? Definitely not! We have just managed to unearth the tip of a giant iceberg. A lot of research and contemplation still needs to be applied in the field of language cognizance to reach human-level perfection.

New discoveries in real-time data analytics and processing are rendering increasingly accurate and personalized tip-offs. Support for multiple languages and the ability to understand your command after eliminating ambient noise are within reach.

However, it is still challenging to keep up with the pace of innovation, cater to ever-rising user expectations, add more and more devices to the network, and introduce fresh features as the initial novelty fizzles out. Of course, there is absolutely no prize for guessing whether all of this software will run in the cloud, with the Internet of Things making the game a lot more interesting.

What do you think? Will voice and gesture control go beyond letting us know whether we should have that last slice of pie or command our car to park in the garage and opt for a 30-minute walk instead? How do you think it will one day change our lives?

Get deeper insight on how to differentiate yourself from your competition in our research brief on The Digital Economy: Disruption, Transformation, Opportunity.


Geetika Tripathi

About Geetika Tripathi

My association with SAP is eight wonderful years. I have a disposition for the latest technological trends and a fascination for all the digital buzz apart from the world of process orchestration, cloud, and platforms.