How to implement voice and audio processing on Arm with Alango Technologies

June 18, 2019

6 minute read time.

Picture your smart assistant at home: you say a command, and it recognizes your voice, processes what you’re saying and responds. This is an example of a multi-sensor device that requires signal processing. Designers of compelling voice communication products like this – as well as the semiconductor solutions that enable these products – are confronted with the challenges of ensuring high performance, while efficiently utilizing system resources.

Intelligibility suffers without preprocessing software. This means that the talker will not be heard or understood by the person on another end of the call or the voice-controlled speaker. The preprocessing software must preserve the voice signal with efficient usage of computational resources – MIPS and memory. Additionally, these designers need intuitive configuration and tuning tools that provide a diagnostic and development environment for rapid product development. So, where do you start, and how do you achieve all of this?

Choosing the best-fit compute

First, you need the right processing power and efficiency for your application. The Arm Cortex family of processors provides a standard architecture to address the broad performance spectrum and cost range required by these diverse product markets. The Arm Cortex family includes processors based on three distinct profiles:

The Cortex-A processor family for sophisticated, high-end applications running mainly complex operating systems
The Cortex-R processor family for high-performance hard real-time systems
The Cortex-M processor family optimized for low-power, deterministic, cost-sensitive microcontroller applications

Cortex processor portfolio

Arm Cortex family of processors

In terms of digital signal processing, this blog will cover the Cortex-A and Cortex-M processor families.

The Arm Cortex-M processor family is particularly suited for a wide range of applications that demand high performance with a low computational footprint, such as voice and audio-based devices. In particular, the Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P processors offer digital signal processing (DSP) extensions (to the Thumb instruction set) and an optional floating-point unit (FPU), combining DSP and high-performance generic code processing all-in-one. They are specifically designed to help improve the performance of numerical algorithms and provide the opportunity to perform signal processing operations directly on the CPU – simplifying programming, reducing power and decreasing bill-of-material (BoM) costs.

For more information about the benefits and features of combining control and DSP all-in-one Arm CPU, read our blog, signal processing capabilities of Cortex-M devices.

The Arm Cortex-A processor family offers higher performance and a richer feature set than the Cortex-M processor family and is particularly suited for applications that undertake complex compute tasks. The processor family supports an advanced Single Instruction Multiple Data (SIMD) architecture extension called Neon technology. Neon improves the multimedia user experience by accelerating audio and video encoding and decoding, user interface, 2D/3D graphics or gaming. It also accelerates signal processing algorithms and functions to speed up applications, such as audio and video processing, voice and facial recognition, computer vision and deep learning.

Alango Technologies is an Arm DSP ecosystem partner who support Arm technology by providing DSP algorithms and software enhancements to improve the quality of voice communication and audio experience in a variety of applications. Their technology is used in many applications, including automotive hands-free car systems, Bluetooth headsets and voice-controlled speakers, to name a few.

Let’s explore Alango Technologies for voice and audio devices.

Voice communication preprocessing and voice activity detection

Preprocessing packages should be customized to suit a multitude of applications with their own specific system requirements and industry compliance standards. Specific examples of products include a True Wireless Stereo “TWS” headset that utilizes near-field microphones and a smart-speaker utilizing a far-field microphone array.

The ideal preprocessing technologies operate with low MIPS and memory to produce natural sounding intelligible voice on the other end of the call or improve performance of the Automatic Speech Recognition “ASR” engine.

Hallmarks of best-in-class preprocessing technology include the following:

Fast adaptation to changes in ambient noise level
Built-in wind noise reduction
Echo-free, full-duplex, communication

Further improvement of voice intelligibility is possible by complimenting the traditional external microphone(s) voice pickup with an in-ear microphone or bone conduction sensor element since the sensor is internal and isolated from environmental sounds. Not to be ignored, innovative downlink channel processing provides another opportunity for the development of value-added products.

These attributes combined together enhance the ability of the designer to develop compelling products that have the real potential to enhance revenues for their company. All of this is possible with Alango Voice Communication Package optimized for the versatile Cortex-M processor platform.

Voice Activity Detection (VAD) is used to detect speech in an acoustic signal, which allows the system to remain in standby until the voice is detected. Alango’s VAD consumes less than 2MIPS of processing power; the lowest in its class for a stand-alone voice activity detection solution available on Arm Cortex-M processors. This is a huge advantage for battery-operated devices that need to operate for long periods of time in-between charging. However, VAD can be used anywhere that speech needs to be detected.

Alango's Voice Activity Detection demo

Alango Technologies on Arm Cortex-M processors

Alango has had excellent results porting and optimizing its software products for Cortex-M based devices developed with Arm Keil MDK, the most comprehensive software development environment for Arm-based microcontrollers. The µVision IDE includes the industry-standard Arm Compiler and provides straightforward means for debugging and profiling written code.

The following Alango software products have been ported and optimized for Cortex-M4 and Cortex-M7 processors:

Voice Communication Package “VCP” - human-human communication - a universal software package of digital signal processing technologies for voice applications enabling high quality, full duplex, and noise free communication from various environments.
Voice Enhancement Package “VEP” - human-machine communication - a suite of real-time software DSP technologies designed for improving speech recognition performance in voice-controlled multimedia devices.
Voice Activity Detector “VAD” - for reliably detecting speech in an acoustic signal
Sound Reinforcement Package “SRP” - a set of DSP software technologies enabling sound reinforcement in real time PA and In-Car Communication (ICC) systems.
Sound Effects Normalization “SEN”: Accentuates speech dialog from aggressive special effects in television and streamed content
MuRefiner audio enhancement allows the user to alleviate the drawbacks of her or his device or listening environment and enjoy the audio content to the maximal possible extent.

Demonstrations of these software products running on the STMicroelectronics STM32F769 Discovery kit are available. Please, contact Alango for further details.

Alango Technologies on Arm Cortex-A processors

The Arm Cortex-A processor family is particularly suited for applications with higher computational load, such as voice-controlled smart-speakers with a far-field pick-up. These voice interface/control products combine many software and hardware technologies; often beyond the skill set of developers. Therefore, to help developers there are a number of single-board computers available to help you get started. For example, Arm Leading Edge partner Seeed Studios provide two kits that allow turnkey advancement of voice-controlled functionality incorporating Alango’s Voice Enhancement Package. These kits allow companies to efficiently exploit high-performance voice-interface products without having to master the aspects of audio and other technology integrations.

Let’s explore the kits from Seeed Studios:

ReSpeaker Core v2.0

Based on the Arm Cortex-A7 processor, the ReSpeaker Core V2 allows developers to create powerful and impactful voice and sound interfaces. The board includes many new features from Core V1, including the opportunity to run Debian and Android.

To get started using this development platform, access documentation and FAQs here.

ReSpeaker 4-Mic Array for Raspberry Pi

Based on the Arm Cortex-A53 processor, the ReSpeaker Raspberry Pi is a 4-microphone expansion board for Raspberry Pi designed for artificial intelligence and voice applications. This product is ideal for building more powerful and flexible voice products that integrate with Amazon Alexa Voice Service and Google Assistant.

To get started using this development platform, access documentation and FAQs here.

To get started with Alango Technologies with Arm-based technology, please contact info@alango.com

Learn more about Arm’s DSP solutions for Cortex-A and Cortex-M

Did you know?

Arm recently announced Arm Helium technology, an architectural extension that will provide even more possibility for future voice and sound devices. Helium will deliver up to 15x performance uplift for machine learning and up to 5x uplift to signal processing tasks running on future Cortex-M processors, expanding the potential for Alango to innovate. Learn more about this new technology and what it means for the industry and voice-based devices.

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How to implement voice and audio processing on Arm with Alango Technologies

Choosing the best-fit compute

Voice communication preprocessing and voice activity detection

Alango Technologies on Arm Cortex-M processors

Alango Technologies on Arm Cortex-A processors

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC