Traditionally audio Digital Signal Processing developers think of using only traditional Digital Signal Processors - but there is some surprising benchmark results with what can be done with the Cortex-A application processors as well as the Cortex-M4 and Cortex-M7s. Becoming aware of all these plus knowing exactly how much processing power your application needs will more accurately help with finding the best processor and reducing BOM costs.
Audio Weaver platform can help with accurately benchmarking a complete audio chain. Why is benchmarking a real design on a dev board important? Unlike the MCUs, cache memory plays an important unpredictable behavior on the Cortex-As. So it is important to be able to actually benchmark already optimized DSP code on the actual board.
For this reason, Audio Weaver by DSP Concepts can save the traditional DSP development time by 90%. Prototype and development can be done prior to hardware readiness on a dev board, the design and code is production and target ready, and real time tuning can be done in the form factor so that there's no need to re-write and re-iterate coding to fit processor footprint.
Below is the presentation given at the AES (Audio Engineering Society) 2014 conference in Los Angeles by pbeckmann, founder of DSP Concepts.
[CTAToken URL = "https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-19-89/PD8_5F00_Beckmann.pdf" target="_blank" text="View presentation" class ="green"]
Maybeyou've heard of Arduino? Perhaps even Teensy? Teensy 3.1 (based on Freescale Kinetis) is currently were pretty where most ARM-based Arduino development is happening. Arduino is shipping an ARM-based product, but if you look at their software repositories, it's very clear nearly all official Arduino open source development is still targeting 8 bit AVR.
I'm developing what you'd call "middleware", which the Arduino world calls "libraries", and also improving the higher level components of the toolchain (gcc is at the low level), used by many thousands of smaller companies, hobbyists, entrepreneurs, students and enthusiasts. Much of this is porting & maintaining 8 bit code on 32 bit ARM, and some involves developing brand new software that dramatically leverages the 32 bit hardware, DMA and more advanced peripherals. Examples include a LED control library called OctoWS2811 and recently easy-to-use audio capability.
It's great for your traditional corporate customers that you've given IAR & Keil early technical access, so their tools are ready to use. But as a small, independent shop trying to create a first-rate Arduino experience on top of Cortex-M4, and hopefully Cortex-M7, I really do depend on you guys to publish those technical documents.
The Cortex-M7 is an architecture-v7M processor and its instruction set is essentially the same as the Cortex-M4, except that it adds optional double-precision floating point support and some extra floating point instructions to bring the Cortex-M7 inline with architecture FPv5 (these are mostly features added to the IEEE standard). The main difference is in the microarchitecture, as the Cortex-M7 has a six stage, superscalar pipeline which is able to dual issue the majority of instruction pairs, hence able to dual issue two arithmetic instructions, a MAC or arithmetic instruction with a load, or dual issue two loads, or a load and a store etc etc. It also has a wide choice of memory interfaces, defaulting to a 64-bit AMBA4 AXI interface with optional instruction and data caches up to 64kB, optional tightly-coupled memory interfaces for code and data (64-bit ITCM, 2x 32-bit DTCMs) ,an AHB-lite interface for low latency AHB peripherals (AHBP) and an AHB slave which allows a DMA engine to DMA directly into the TCMs.
There will be a revision of the architecture-v7M manual published towards the end of this year (ie soon) which will document the small extensions to the FP instruction set and the cache and TCM maintenance operations which are all made via memory-mapped registers in the normal 4GB Cortex-M address space - we will also be publishing the Technical Reference Manual. These are confidential right now as we have not yet reached the milestone in the project where they can be widely distributed - again this should be around the end of the year or very early 2015.
If you are writing application code, you will find substantial speedup even by running unchanged Cortex-M4 code on Cortex-M7, due to the pipeline improvements.
From a debug point of view Cortex-M7 is similar to Cortex-M4, but a licensee has the option to add full data trace to the ETM (which uses the new ETMv4 protocol).
Versions of popular toolchains (Keil, IAR etc) have been updated to support Cortex-M7, so you can start writing code for it now.
Hope that helps.
Maybe pbeckmann can answer to the specifics. But I know think some device makers are making the M7 chips pin comparable with M4 so that you can replace it directly, just running more efficiently. Maybe automatically takes care of these?
Page 15 claims Cortex-M7 has "Further architecture improvements for DSP", and page 16 says the new features are "Load and store in parallel withmath" and "Zero overhead loops".
Has ARM actually published an update to the v7m architecture reference, or any other detailed info about the changes to the instruction set or other details? As the author of quite a bit of code targeting the M4 chips, and eventually the M7's, I'd really like to know these details.
Hi Yasuhiko-san,
The benchmarks are results from the real chip execution. DSP Concepts provided the CMSIS library for ARM on the Cortex-M4 and the new Cortex-M7. You are right that It is most accurate to benchmark real designs on the actual chip.
If you have specific technical questions, I encourage you to contact pbeckmann. He would love to answer any DSP questions.