Soshun Arai, Arm and Mark Sykes, Recognition Technologies discuss the next generation of embedded voice for the automotive cockpit.
Recognition Technologies is a partner of Arm and their RecoMadeEasy® software provides embedded natural language speech, speaker, and facial recognition for Arm’s Cortex-A application processors.
Mark Sykes, Recognition Technologies
Existing in-car voice recognition systems are currently constrained systems that use restrictive grammar, limited vocabulary, keyword and phrase spotting. In-vehicle infotainment (IVI) interfaces have greatly improved the cockpit experience, but voice to date has not delivered on its potential to bridge the gap between hands-free, in-vehicle control and safer driving.
User complaints for in-car voice recognition systems is nearly four times the rate of reported problems with transmissions. However, motivation from users for in-vehicle voice interfaces has never been higher. Next generation large vocabulary, continuous speech recognition systems promote the freedom of speaking to an unconstrained system and offers a significant performance increase and a much improved solution over existing constrained systems that use restrictive grammar, limited vocabulary, keyword and phrase spotting. From a utility perspective, drivers as well as passengers want on-demand, real-time voice control that match high accuracy with easy operability and reliability.
Soshun Arai, Arm
The automotive industry is changing due to a paradigm shift into the era of the connected car. Meanwhile, the continually improving computing power and compact size of mobile processors is transforming into an integral in-vehicle computer system that can handle real-time, mission-critical embedded services. IVI is the front line in offering a better user experience (UX) to drivers and passengers, for a more comfortable and efficient journey. The automotive industry is keen to provide a seamless UX to a customer base who already enjoy a polished mobile UX every day. Thanks to new regulation to restrict the use of mobile phones while driving, gesture and voice commands will increasingly become the interface of choice in IVI.
The proliferation of cloud-based recognition technologies, such as speech, speaker and facial recognition, and natural language processing that rely on distributed processing are commonplace. However, the limitations of these cloud-based services are a hot topic of discussion especially when aiming to apply them to an automotive environment that call for user privacy, security and connectivity requirements.
With more than 85% of infotainment systems and many other applications, such as dashboard and body built with Arm-based chips, today's automotive experience is founded on Arm technology. A common architecture across all electronics and support from a leading tools ecosystem enables car makers and their suppliers to rapidly innovate in both hardware and software. Real-time automotive cockpit environments have to be able to reliably perform even in the presence of service interruption. To support this, more computing performance will be required in IVI to support not only better hardware, such as multiple larger displays with higher resolution, Heads Up Displays (HUD) and better surround speaker systems, but also software to control these systems more easily. Arm will support next generation IVI with higher performance and power efficiency processors and power-optimization technology to provide a significant boost to these rich, context-aware experiences. Natural language processing is becoming an important UX requirement. Arm will support next generation IVI with higher performance and power-efficiency processors and power-optimization technology to provide a significant boost to these rich, context-aware experiences.
Cloud-based recognition systems are a great solution when full, uninterrupted and secure access to a cloud server is guaranteed. However, in real-time automotive cockpit environments, this is not the case and cars have to be able to reliably perform a task even in the presence of service interruption. In order to limit the number of potential points of failure, localized processing has advantages over centralized servers wherein each vehicle becomes capable of collecting and processing its own voice data. Mission critical tasks – such as user or vehicle security, privacy, and protection – are processed natively on the device to ensure continuous availability.One very significant potential point of failure when anticipating the rise of the connected car is bandwidth. Cloud computing depends on bandwidth allocation and connectivity to function. Transmitting voice data on average per second is more than one thousand times larger than transmitting the equivalent textual data per second. If voice audio data is used as the lone method of operation at scale, then one can be fairly certain that bandwidth limitations will occur quickly.
Arm Cortex-A applications processors are already widely used in today’s infotainment systems. Next generation systems will use Cortex-A53, Cortex-A57, and Cortex-A72. These mobile processors will bring very compelling single-threaded performance, which in automotive is of particular interest for real-time voice recognition. Architecturally, the Cortex-A53 is fully compatible with the Cortex-A57 and Cortex-A72 and supports the Arm big.LITTLE architecture. The latest big.LITTLE platforms can save 75% of the CPU energy consumption in low to moderate performance scenarios and can increase the performance by 40% in highly threaded workloads. Automotive is known as a heat and power constrained environment where this kind of energy management is very applicable. Infotainment systems encompass audio, radio, video processing, navigation, and many interrupts from the vehicle’s controller area network (CAN). LITTLE processors can filter out and handle the low-intensity tasks, leaving the big processors to utilize all available resources for tasks that require high performance.
You can learn more about how Arm and Recognition Technologies are changing how drivers interact with IVI systems in the whitepaper Reimagining voice in the driving seat