Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Embedded and Microcontrollers blog Signal processing capabilities of Cortex-M devices
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Digital Signal Processor (DSP)
  • Cortex-M
  • CMSIS
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Signal processing capabilities of Cortex-M devices

Laurent Le Faucheur
Laurent Le Faucheur
January 28, 2019

Picture your smart assistant at home: you say a command, and it recognizes your voice, processes what you’re saying and responds. This is an example of a multi-sensor device that requires signal processing. Signal processing technology is critical in all sorts of devices around us today: wearables, audio headsets, smart speakers and cameras. We see spectacular growth in autonomous, intelligent, and connected devices like this, and the challenge is that they must operate in a low-power environment.

signal processing applications

Signal processing technology is critical in all these devices around the home

To achieve signal processing functionality, these applications previously used a simple microcontroller (MCU) based on an Arm Cortex-M0 or Cortex-M3 processor together with a separate proprietary, dedicated Digital Signal Processor (DSP). Now, however, we are seeing an increasing number of product manufacturers (or Original Equipment Manufacturers - OEMs) switching to a single, high-performance, low-power MCU with DSP extensions, such as the Cortex-M4, Cortex-M7, Cortex-M33 or Cortex-M35P processor, to replace the two-processor design.

Using an Arm-based combination of MCU and DSP functionality in one processor has some advantages for OEMs, enabling them to:

  • Save significantly on the bill of material (BoM) costs of their products, by replacing two processors with one
  • Reduce in system-level complexity by removing the need for shared memory, MCU and DSP communication, complex multiprocessor bus architectures and other custom “glue” logic between the MCU and DSP
  • More easily develop and debug an application as all the processes are in one place
  • Reduce software development costs, as the entire project can be supported using a single compiler/debugger/IDE
  • Save development time by taking advantage of Arm’s optimized DSP library functions
  • Benefit from being programmable in a high-level programming language such as C or C++, rather than the handcrafted assembler often used for a proprietary DSP

This blog will cover the signal processing capabilities on the easy-to-use Arm Cortex-M processors and how to take advantage of software support from Arm's ecosystem partners. I will also cover how the architecture of our processors allow efficient implementation of the algorithms and details of the free DSP library from Arm, which includes an example for noise cleaning of electrocardiography signal recording.  

Introduction to signal processing

Signal processing algorithms are applied to raw data from the analog to digital converters to shape the data to improve the decisions made by the application software. Typical algorithms control the amplitude of the signal, remove the noise or estimate the frequency of oscillation.

The key operations used for signal processing are based on a mathematical operation called discrete convolution. Convolution is created by a sum of products, so any processor able to compute this efficiently in one cycle will result in a sum of products that can be used for signal processing.

Thirty years ago, data processing was limited to 10 million multiplies per second with 16-bit operands, and the address space was limited to a few tens of kBytes. Today, a small Cortex-M3 can be synthesized at much more than 500MHz; it computes 32-bit multiplications, accumulates 64-bits, and it has several gigabytes of address space. While the Cortex-M3 doesn’t have DSP extensions, it can still do signal processing. There is no practical limitation for using Cortex-M devices for complex signal processing computation, and this blog will share some practical examples.

An overview of the Arm embedded processor portfolio

First, let’s take a step back to look at the technology Arm offers and help you understand the best fit for your application. The Arm Cortex family of processors provides a standard architecture to address the broad performance spectrum and cost range required by these diverse product markets. The Arm Cortex family includes processors based on three distinct profiles:

  • The Cortex-A processor family for sophisticated, high-end applications running mainly complex operating systems
  • The Cortex-R processor family for high performance hard real-time systems
  • The Cortex-M processor family optimized for low power, deterministic, cost-sensitive microcontroller applications

Cortex-A and Cortex-R processors include the NEON SIMD (single instruction, multiple data) extensions that provide high-performance mathematical instructions for signal and data processing.

Cortex-A and Cortex-R processors are used extensively for signal processing applications. This blog focuses on the Cortex-M processor family, so let’s take a look at the range of benefits and performance points offered by Cortex-M processors. Here’s a quick guide to the highlights:

  • For lowest power and area: Cortex-M0+ and Cortex-M23 processors
  • For performance and power efficiency: Cortex-M3, Cortex-M4, and Cortex-M33 processors
  • For high performance: Cortex-M7
  • For tamper-resistant security technology: Cortex-M35P

Cortex-M processor portfolio

Arm Cortex-M processor portfolio, including those with DSP extensions

Arm digital signal controllers with MCU and DSP capabilities

The Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P are digital signal controllers that address the need for high-performance generic code processing as well as digital signal processing applications. These processors include DSP extensions to the Thumb instruction set and include the optional floating-point unit (FPU). These instructions are designed to help improve the performance of numerical algorithms and provide the opportunity to perform signal processing operations directly on the CPU. As mentioned above, many years ago, you may have used a Cortex-M3 for signal processing. However, these Cortex-M processors that combine DSP extensions provide far better performance.

Why have a combination of control and DSP all-in-one Arm CPU? Here’s a quick overview:

  • Guarantee your signal processing algorithms will be portable going from one processor to the other
  • Take advantage of high processing capabilities using integer or floating-point data formats
  • Benefit from the choice and flexibility offered by a network of signal processing partners with expertise in voice, audio, motors, machine-learning
  • Deploy functional safety features
  • Take advantage of multi-source silicon supply while preserving your software investments
  • Speed up software development with a free software library of signal processing kernels available on the Arm GitHub

To get started with Arm-based chips with DSP development, check out Arm's silicon partners. NXP, ST, and Nordic Semiconductor announced Cortex-M33 based chips with DSP functionality last year. Read more on the TrustZone for Armv8-M community page.

Speed-up software development with Arm’s free library of signal processing kernels

The CMSIS-DSP and CMSIS-NN library is a suite of common signal processing and mathematical functions that have been optimized for Cortex-M processors. The library is freely available as part of the CMSIS release and includes all source code. The functions in the library are divided into several categories:

  • Vector math (dot products, add, multiply)
  • Math functions (trigonometric, square-root)
  • Complex numbers vector math functions
  • Filtering (convolution, FIR, Biquads)
  • Matrix functions (add, inverse, transpose)
  • Transforms (FFT, DCT)- Motor control functions
  • Statistical functions (RMS, Variance)
  • Interpolation functions (Interpolation)
  • Neural Network convolution pooling and activation functions
  • Fully-connected layer functions

The library has separate functions for operating on 8-bit integers, 16-bit integers, 32-bit integers, and 32-bit floating-point values. You can use the CMSIS-DSP source code, modify it, distribute it, without any constraint to publishing any detail of your software.

Example: ECG filtering and detection

Here is an example of signal processing applied to noise removal and signal detection on an electrocardiography recording. A Cortex-M microcontroller captured the ECG physiological data using a 500Hz sampling rate. The data stream was processed through a noise removal algorithm (upper wave below) and a pulse detection was applied on the cleaned version of the data (second wave below).

noise removal and detection ECGNoise removal and detection of ECG data

The noise removal suppresses the low-frequency modulation and the 60Hz interference from AC power lines. The detection algorithm finds the peaks in the input stream over a sliding window and determines the start of the heart period using statistical estimation.

The filter uses the following three poles (radian/amplitude): (0.05 [rad] / 0.98); (0.25 [rad] / 0.9); (0.45 [rad] / 0.97). And three zeroes all placed on the Z-circle with angles: 0.02 [rad], 0.65 [rad] and 1 [rad]. The filter gain is 0.02. This filter removes the close-to-DC spectral components and removes the noise around the frequency power line in the 50Hz to 60Hz area.

Filter characteristics from MathWorks

Filter characteristics from the MathWorks filter design tool

When running on a Cortex-M3 processor, the ECG signal processing consumes less than 0.1MHz of CPU load. More precisely the processing of one second of the signal through a cascade of three bi-quad filters takes 55k cycles/s, and the energy computation and threshold detection take 15k cycles/s, adding some implementation margin and time for buffer copies this approximately 0.1MHz.

Getting started with DSP on Cortex-M

I hope this blog demonstrated the benefits of using DSP and control all-in-one CPU from Arm. As markets move more towards streaming, connectivity, and interactive user interfaces, there will be an increasing demand for performance in low-power, embedded devices. Using a single microcontroller with DSP capabilities, rather than a lower performance microcontroller with separate DSP, reduces BoM cost, system-level complexity, software development costs, and timescales. 

We expect that an ever-increasing number of consumer devices will benefit from the high-performance, low-power and low-latency response of the Cortex-M4, Cortex-M7, Cortex-M33 and Cortex-M35P processors from the Cortex-M family. Combine those with Arm’s free software libraries to get a head start on DSP development.

Access the free CMSIS-DSP software library

Anonymous
  • Manuel Z.
    Manuel Z. over 4 years ago

    Nice example! You said that the example consumes 0.1 MHz in a Cortex M3, but why are you mentioning an M3 if they don't have DSP extensions?

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Embedded and Microcontrollers blog
  • Adapting Kubernetes for high-performance IoT Edge deployments

    Alexandre Peixoto Ferreira
    Alexandre Peixoto Ferreira
    In this blog post, we address heterogeneity in IoT edge deployments using Kubernetes.
    • August 21, 2024
  • Evolving Edge Computing and Harnessing Heterogeneity

    Alexandre Peixoto Ferreira
    Alexandre Peixoto Ferreira
    This blog post identifies heterogeneity as an opportunity to create better edge computing systems.
    • August 21, 2024
  • Demonstrating a Hybrid Runtime for Containerized Applications in High-Performance IoT Edge

    Chris Adeniyi-Jones
    Chris Adeniyi-Jones
    In this blog post, we show how a hybrid runtime and k3s can be used to deploy an application onto an edge platform that includes an embedded processor.
    • August 21, 2024