Picture your smart assistant at home: you say a command, and it recognizes your voice, processes what you’re saying and responds. This is an example of a multi-sensor device that requires signal processing. Designers of compelling voice communication products like this – as well as the semiconductor solutions that enable these products – are confronted with the challenges of ensuring high performance, while efficiently utilizing system resources.
Intelligibility suffers without preprocessing software. This means that the talker will not be heard or understood by the person on another end of the call or the voice-controlled speaker. The preprocessing software must preserve the voice signal with efficient usage of computational resources – MIPS and memory. Additionally, these designers need intuitive configuration and tuning tools that provide a diagnostic and development environment for rapid product development. So, where do you start, and how do you achieve all of this?
First, you need the right processing power and efficiency for your application. The Arm Cortex family of processors provides a standard architecture to address the broad performance spectrum and cost range required by these diverse product markets. The Arm Cortex family includes processors based on three distinct profiles:
Arm Cortex family of processors
In terms of digital signal processing, this blog will cover the Cortex-A and Cortex-M processor families.
The Arm Cortex-M processor family is particularly suited for a wide range of applications that demand high performance with a low computational footprint, such as voice and audio-based devices. In particular, the Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P processors offer digital signal processing (DSP) extensions (to the Thumb instruction set) and an optional floating-point unit (FPU), combining DSP and high-performance generic code processing all-in-one. They are specifically designed to help improve the performance of numerical algorithms and provide the opportunity to perform signal processing operations directly on the CPU – simplifying programming, reducing power and decreasing bill-of-material (BoM) costs.
For more information about the benefits and features of combining control and DSP all-in-one Arm CPU, read our blog, signal processing capabilities of Cortex-M devices.
The Arm Cortex-A processor family offers higher performance and a richer feature set than the Cortex-M processor family and is particularly suited for applications that undertake complex compute tasks. The processor family supports an advanced Single Instruction Multiple Data (SIMD) architecture extension called Neon technology. Neon improves the multimedia user experience by accelerating audio and video encoding and decoding, user interface, 2D/3D graphics or gaming. It also accelerates signal processing algorithms and functions to speed up applications, such as audio and video processing, voice and facial recognition, computer vision and deep learning.
Alango Technologies is an Arm DSP ecosystem partner who support Arm technology by providing DSP algorithms and software enhancements to improve the quality of voice communication and audio experience in a variety of applications. Their technology is used in many applications, including automotive hands-free car systems, Bluetooth headsets and voice-controlled speakers, to name a few.
Let’s explore Alango Technologies for voice and audio devices.
Preprocessing packages should be customized to suit a multitude of applications with their own specific system requirements and industry compliance standards. Specific examples of products include a True Wireless Stereo “TWS” headset that utilizes near-field microphones and a smart-speaker utilizing a far-field microphone array.
The ideal preprocessing technologies operate with low MIPS and memory to produce natural sounding intelligible voice on the other end of the call or improve performance of the Automatic Speech Recognition “ASR” engine.
Hallmarks of best-in-class preprocessing technology include the following:
Further improvement of voice intelligibility is possible by complimenting the traditional external microphone(s) voice pickup with an in-ear microphone or bone conduction sensor element since the sensor is internal and isolated from environmental sounds. Not to be ignored, innovative downlink channel processing provides another opportunity for the development of value-added products.
These attributes combined together enhance the ability of the designer to develop compelling products that have the real potential to enhance revenues for their company. All of this is possible with Alango Voice Communication Package optimized for the versatile Cortex-M processor platform.
Voice Activity Detection (VAD) is used to detect speech in an acoustic signal, which allows the system to remain in standby until the voice is detected. Alango’s VAD consumes less than 2MIPS of processing power; the lowest in its class for a stand-alone voice activity detection solution available on Arm Cortex-M processors. This is a huge advantage for battery-operated devices that need to operate for long periods of time in-between charging. However, VAD can be used anywhere that speech needs to be detected.
Alango's Voice Activity Detection demo
Alango has had excellent results porting and optimizing its software products for Cortex-M based devices developed with Arm Keil MDK, the most comprehensive software development environment for Arm-based microcontrollers. The µVision IDE includes the industry-standard Arm Compiler and provides straightforward means for debugging and profiling written code.
The following Alango software products have been ported and optimized for Cortex-M4 and Cortex-M7 processors:
Demonstrations of these software products running on the STMicroelectronics STM32F769 Discovery kit are available. Please, contact Alango for further details.
The Arm Cortex-A processor family is particularly suited for applications with higher computational load, such as voice-controlled smart-speakers with a far-field pick-up. These voice interface/control products combine many software and hardware technologies; often beyond the skill set of developers. Therefore, to help developers there are a number of single-board computers available to help you get started. For example, Arm Leading Edge partner Seeed Studios provide two kits that allow turnkey advancement of voice-controlled functionality incorporating Alango’s Voice Enhancement Package. These kits allow companies to efficiently exploit high-performance voice-interface products without having to master the aspects of audio and other technology integrations.
Let’s explore the kits from Seeed Studios:
Based on the Arm Cortex-A7 processor, the ReSpeaker Core V2 allows developers to create powerful and impactful voice and sound interfaces. The board includes many new features from Core V1, including the opportunity to run Debian and Android.
To get started using this development platform, access documentation and FAQs here.
Based on the Arm Cortex-A53 processor, the ReSpeaker Raspberry Pi is a 4-microphone expansion board for Raspberry Pi designed for artificial intelligence and voice applications. This product is ideal for building more powerful and flexible voice products that integrate with Amazon Alexa Voice Service and Google Assistant.
To get started with Alango Technologies with Arm-based technology, please contact info@alango.com
[CTAToken URL = "https://developer.arm.com/architectures/instruction-sets/dsp-extensions" target="_blank" text="Learn more about Arm’s DSP solutions for Cortex-A and Cortex-M" class ="green"]
Did you know? Arm recently announced Arm Helium technology, an architectural extension that will provide even more possibility for future voice and sound devices. Helium will deliver up to 15x performance uplift for machine learning and up to 5x uplift to signal processing tasks running on future Cortex-M processors, expanding the potential for Alango to innovate. Learn more about this new technology and what it means for the industry and voice-based devices.
Did you know?
Arm recently announced Arm Helium technology, an architectural extension that will provide even more possibility for future voice and sound devices. Helium will deliver up to 15x performance uplift for machine learning and up to 5x uplift to signal processing tasks running on future Cortex-M processors, expanding the potential for Alango to innovate. Learn more about this new technology and what it means for the industry and voice-based devices.