How to choose the best processor for your audio DSP application?

October 27, 2014

Traditionally audio Digital Signal Processing developers think of using only traditional Digital Signal Processors - but there is some surprising benchmark results with what can be done with the Cortex-A application processors as well as the Cortex-M4 and Cortex-M7s. Becoming aware of all these plus knowing exactly how much processing power your application needs will more accurately help with finding the best processor and reducing BOM costs.

Audio Weaver

Audio Weaver platform can help with accurately benchmarking a complete audio chain. Why is benchmarking a real design on a dev board important? Unlike the MCUs, cache memory plays an important unpredictable behavior on the Cortex-As. So it is important to be able to actually benchmark already optimized DSP code on the actual board.

For this reason, Audio Weaver by DSP Concepts can save the traditional DSP development time by 90%. Prototype and development can be done prior to hardware readiness on a dev board, the design and code is production and target ready, and real time tuning can be done in the form factor so that there's no need to re-write and re-iterate coding to fit processor footprint.

Below is the presentation given at the AES (Audio Engineering Society) 2014 conference in Los Angeles by pbeckmann, founder of DSP Concepts.

[CTAToken URL = "https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-19-89/PD8_5F00_Beckmann.pdf" target="_blank" text="View presentation" class ="green"]

Parents

Ian Johnson over 10 years ago

The Cortex-M7 is an architecture-v7M processor and its instruction set is essentially the same as the Cortex-M4, except that it adds optional double-precision floating point support and some extra floating point instructions to bring the Cortex-M7 inline with architecture FPv5 (these are mostly features added to the IEEE standard). The main difference is in the microarchitecture, as the Cortex-M7 has a six stage, superscalar pipeline which is able to dual issue the majority of instruction pairs, hence able to dual issue two arithmetic instructions, a MAC or arithmetic instruction with a load, or dual issue two loads, or a load and a store etc etc. It also has a wide choice of memory interfaces, defaulting to a 64-bit AMBA4 AXI interface with optional instruction and data caches up to 64kB, optional tightly-coupled memory interfaces for code and data (64-bit ITCM, 2x 32-bit DTCMs) ,an AHB-lite interface for low latency AHB peripherals (AHBP) and an AHB slave which allows a DMA engine to DMA directly into the TCMs.
There will be a revision of the architecture-v7M manual published towards the end of this year (ie soon) which will document the small extensions to the FP instruction set and the cache and TCM maintenance operations which are all made via memory-mapped registers in the normal 4GB Cortex-M address space - we will also be publishing the Technical Reference Manual. These are confidential right now as we have not yet reached the milestone in the project where they can be widely distributed - again this should be around the end of the year or very early 2015.
If you are writing application code, you will find substantial speedup even by running unchanged Cortex-M4 code on Cortex-M7, due to the pipeline improvements.
From a debug point of view Cortex-M7 is similar to Cortex-M4, but a licensee has the option to add full data trace to the ETM (which uses the new ETMv4 protocol).
Versions of popular toolchains (Keil, IAR etc) have been updated to support Cortex-M7, so you can start writing code for it now.
Hope that helps.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Comment

Ian Johnson over 10 years ago

The Cortex-M7 is an architecture-v7M processor and its instruction set is essentially the same as the Cortex-M4, except that it adds optional double-precision floating point support and some extra floating point instructions to bring the Cortex-M7 inline with architecture FPv5 (these are mostly features added to the IEEE standard). The main difference is in the microarchitecture, as the Cortex-M7 has a six stage, superscalar pipeline which is able to dual issue the majority of instruction pairs, hence able to dual issue two arithmetic instructions, a MAC or arithmetic instruction with a load, or dual issue two loads, or a load and a store etc etc. It also has a wide choice of memory interfaces, defaulting to a 64-bit AMBA4 AXI interface with optional instruction and data caches up to 64kB, optional tightly-coupled memory interfaces for code and data (64-bit ITCM, 2x 32-bit DTCMs) ,an AHB-lite interface for low latency AHB peripherals (AHBP) and an AHB slave which allows a DMA engine to DMA directly into the TCMs.
There will be a revision of the architecture-v7M manual published towards the end of this year (ie soon) which will document the small extensions to the FP instruction set and the cache and TCM maintenance operations which are all made via memory-mapped registers in the normal 4GB Cortex-M address space - we will also be publishing the Technical Reference Manual. These are confidential right now as we have not yet reached the milestone in the project where they can be widely distributed - again this should be around the end of the year or very early 2015.
If you are writing application code, you will find substantial speedup even by running unchanged Cortex-M4 code on Cortex-M7, due to the pipeline improvements.
From a debug point of view Cortex-M7 is similar to Cortex-M4, but a licensee has the option to add full data trace to the ETM (which uses the new ETMv4 protocol).
Versions of popular toolchains (Keil, IAR etc) have been updated to support Cortex-M7, so you can start writing code for it now.
Hope that helps.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Children

No Data

Embedded blog

Adapting Kubernetes for high-performance IoT Edge deployments

Alexandre Peixoto Ferreira

In this blog post, we address heterogeneity in IoT edge deployments using Kubernetes.
- August 21, 2024
Evolving Edge Computing and Harnessing Heterogeneity

Alexandre Peixoto Ferreira

This blog post identifies heterogeneity as an opportunity to create better edge computing systems.
- August 21, 2024
Demonstrating a Hybrid Runtime for Containerized Applications in High-Performance IoT Edge

Chris Adeniyi-Jones

In this blog post, we show how a hybrid runtime and k3s can be used to deploy an application onto an edge platform that includes an embedded processor.
- August 21, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

How to choose the best processor for your audio DSP application?

Audio Weaver

Adapting Kubernetes for high-performance IoT Edge deployments

Evolving Edge Computing and Harnessing Heterogeneity

Demonstrating a Hybrid Runtime for Containerized Applications in High-Performance IoT Edge