Voice is everywhere and being integrated into devices all around us, from voice-activated alarm clocks to voice-controlled vacuum cleaners! As this market grows, cutting-edge voice devices need to be scalable, based on a proven foundation and widely supported by a broad software ecosystem.
Chris Shore, Director of Embedded Solutions at Arm, recently sat down with one of the leaders and visionaries in voice-based engineering: Ken Sutton, President and CEO of Yobe, Inc.
Picture this: you’re in a loud room with lots of conversations, such as a dinner party. You want your Amazon Echo to turn the music down or change the sound – but using the wake word in a loud setting isn’t picked up as your voice is lost in the noise. But, when Ken showed me this on his Yobe-enabled device, he can stand across the room with music blaring near the device and his voice is uniquely picked up.
This is one example of what Yobe is pioneering and why we are excited for our DSP software engineers to be working closely with them.
Yobe was co-founded in 2014 by Ken Sutton, and S. Hamid Nawab, PhD (MIT), an internationally renowned researcher and professor of electrical and computer engineering at Boston University.
Today, Yobe is revolutionizing the field of signal processing with its “Signal Processing That Thinks” platform. VISPR (Voice Identification System for User Profile Retrieval) was Yobe’s first product launch and it is a solution that, in its commercial form, addresses most of the persistent challenges of voice technology in the market today the inability for smart, connected devices to accurately identify, track and personalize voice interactions when they take place in uncontrolled or noisy, everyday environments.
Yobe is solving this problem by enabling voice-based products to use just a wake word to immediately identify the voice, track it and personalize it; reducing wake-word and speech recognition errors by up to 85% in high noise environments.
We will consider ourselves successful when voice interface between humans and their connected devices is seamless. For this to happen, voice technologies need to work effectively in any auditory environment. Issues of poor speech recognition and speaker identification will need to be addressed before there can be true interface personalization. We see Yobe enabling multiple use cases where voice plays an important role, whether that be smart device user experience, business productivity or even mission-critical applications that rely on voice commands.
We believe that Yobe’s Artificial Intelligence (AI)-assisted signal processing will help eliminate the current challenges around identifying, tracking and separating voices to enable improved pattern recognition and overall sound quality, speech command accuracy for far-field applications and speaker identification platforms.
Yobe has successfully configured its generic artificial intelligence - Assisted Signal Processing algorithms for operation on an Arm Cortex-M7 processor, to enable wake-word recognition in acoustically challenging environments, as the one described above. These algorithms employ a blend of rule-based abductive reasoning and shallow machine learning strategies to control advanced signal processing operations that utilize the CMSIS library for intensive FFT and vector multiply or add operations.
We used Arm because of the combination of performance and the wide product offering. Also, as a software-centric company, we have seen value in Arm’s willingness to offer their internal support. Arm’s software engineering team managed by Laurent Le Faucheur has made libraries and technical support available to us as we worked through the implementation of our software on their microcontrollers.
As Arm furthers their expansion into the voice and low-power markets, we will continue to build and strengthen our partnership in the future. We believe there is a massive opportunity to grow our signal processing solutions and on different hardware platforms. Some potential front-end applications for the future range from automatic speech recognition (ASR), voice recognition (AVR) and wake-word recognition (WWR).
These applications could be used in noisy everyday environments and in the healthcare industry where the separation of different neuronal components of EMG signals for prosthetic control or hearing-aid processors is needed. Yobe algorithms can be configurable on many different hardware platforms, including cloud-based servers, on device solutions for OEMs and microcontrollers.
We see voice becoming the touch-screen of the next generation, which we feel will play a big role in the growth of the IoT market. Everything is going to be connected, from your microwave to your coffee machine -- and the market is going to demand that it is voice-enabled. These voice-connected systems will need to operate at a high level of accuracy in any auditory environment.
I don't see this as a challenge for our company and customers, but rather a great opportunity. Identifying and personalizing voice in uncontrollable, connected environments is going to require need voice customization and personalization to massively improve beyond where we are today with current voice technology. With Yobe’s ability to track voice DNA to accurately separate voices of interest from the background and near-field noise, all-connected environments can be managed. This is a win for the connected consumer and for Yobe.
Explore more about Yobe and explore Arm’s DSP technologies that make this possible on the link below:
[CTAToken URL = "https://developer.arm.com/technologies/dsp/dsp-for-cortex-m/" target="_blank" text="Learn more" class ="green"]