Traditionally audio Digital Signal Processing developers think of using only traditional Digital Signal Processors - but there is some surprising benchmark results with what can be done with the Cortex-A application processors as well as the Cortex-M4 and Cortex-M7s. Becoming aware of all these plus knowing exactly how much processing power your application needs will more accurately help with finding the best processor and reducing BOM costs.
Audio Weaver platform can help with accurately benchmarking a complete audio chain. Why is benchmarking a real design on a dev board important? Unlike the MCUs, cache memory plays an important unpredictable behavior on the Cortex-As. So it is important to be able to actually benchmark already optimized DSP code on the actual board.
For this reason, Audio Weaver by DSP Concepts can save the traditional DSP development time by 90%. Prototype and development can be done prior to hardware readiness on a dev board, the design and code is production and target ready, and real time tuning can be done in the form factor so that there's no need to re-write and re-iterate coding to fit processor footprint.
Below is the presentation given at the AES (Audio Engineering Society) 2014 conference in Los Angeles by pbeckmann, founder of DSP Concepts.
[CTAToken URL = "https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-19-89/PD8_5F00_Beckmann.pdf" target="_blank" text="View presentation" class ="green"]
The Cortex-M7 is an architecture-v7M processor and its instruction set is essentially the same as the Cortex-M4, except that it adds optional double-precision floating point support and some extra floating point instructions to bring the Cortex-M7 inline with architecture FPv5 (these are mostly features added to the IEEE standard). The main difference is in the microarchitecture, as the Cortex-M7 has a six stage, superscalar pipeline which is able to dual issue the majority of instruction pairs, hence able to dual issue two arithmetic instructions, a MAC or arithmetic instruction with a load, or dual issue two loads, or a load and a store etc etc. It also has a wide choice of memory interfaces, defaulting to a 64-bit AMBA4 AXI interface with optional instruction and data caches up to 64kB, optional tightly-coupled memory interfaces for code and data (64-bit ITCM, 2x 32-bit DTCMs) ,an AHB-lite interface for low latency AHB peripherals (AHBP) and an AHB slave which allows a DMA engine to DMA directly into the TCMs.
There will be a revision of the architecture-v7M manual published towards the end of this year (ie soon) which will document the small extensions to the FP instruction set and the cache and TCM maintenance operations which are all made via memory-mapped registers in the normal 4GB Cortex-M address space - we will also be publishing the Technical Reference Manual. These are confidential right now as we have not yet reached the milestone in the project where they can be widely distributed - again this should be around the end of the year or very early 2015.
If you are writing application code, you will find substantial speedup even by running unchanged Cortex-M4 code on Cortex-M7, due to the pipeline improvements.
From a debug point of view Cortex-M7 is similar to Cortex-M4, but a licensee has the option to add full data trace to the ETM (which uses the new ETMv4 protocol).
Versions of popular toolchains (Keil, IAR etc) have been updated to support Cortex-M7, so you can start writing code for it now.
Hope that helps.