Today Arm announced the Cortex-M55 processor: Arm’s most AI-capable Cortex-M processor and the first to feature Arm Helium technology, bringing a significant uplift in energy-efficient ML and DSP performance to IoT devices. Arm offers several development tools and models to help partners along their path to bringing a Cortex-M55 based device to market. Arm tools and models are especially useful for understanding architecture differences and performance improvements compared to previous Cortex-M designs. In this article we will take a look at Arm tools and highlight what’s new for the Cortex-M55 processor and how you can get a headstart developing software before silicon arrives.
The Cortex-M55 processor provides increased on-device processing performance, while maintaining the ease of use of Cortex-M, including a single toolchain and a familiar software development ecosystem. More than 150 new scalar and vector instructions, low overhead loops, and half-precision floating-point all contribute to up to 15x ML performance improvement and up to 5x signal processing performance uplift compared to existing Cortex-M processors.
The Cortex-M55 processor is highly configurable with multiple options for increased performance and security. The vector unit supports two 32-bit MAC operations per cycle for DSP applications and the extended arithmetic support includes 8-bit fixed point for ML workloads.
The Cortex-M55 processor brings together several benefits including increased ML and DSP performance, a simple programmer’s model, unified control and signal processing, and support for common ML frameworks such as TensorFlow Lite for Microcontrollers. ML workloads are further accelerated when the Cortex-M55 processor is combined with the Ethos-U55, the first Arm microNPU for Cortex-M. The smallest devices in the world will now participate in the AI revolution.
Arm tools enable software to be compiled, executed, and debugged before Cortex-M55 based silicon is available.
All tools are available now for early access and will become public during March and April 2020. Please contact us to find out how to obtain any of these tools prior to the public release.
Here is the overview of available tools:
* Development Studio Platinum Edition users can get early access to Armv8.1-M features in the 2019.b release.
Let’s look at how these tools can help to get the most out of the Cortex-M55 processor.
Arm Compiler 6, included in both Keil MDK and Arm Development Studio, is the most up-to-date compiler for the Arm architecture, including Armv8.1-M and the Cortex-M55 processor. Arm Compiler 6.14 adds specific support for the Cortex-M55 processor. It brings together the modern LLVM compiler infrastructure and the highly optimized Arm libraries and linker to produce performance and power- optimized embedded software.
As with any new compiler support the performance and code size improve over time, driven by the experience and feedback from real-world use-cases. In the case of the Cortex-M55 processor, Arm Compiler supports the new instructions of Armv8.1-M as well as auto vectorization.
To demonstrate Arm tools there is a small hello world program for the Cortex-M55 processor on github.
The example includes a Makefile to compile the application. The Makefile includes makefile.inc which isolates the Cortex-M55 switches.
The hello.c application highlights a few things about the Cortex-M55 processor. It demonstrates how to enable the Extension Processing Unit (EPU) using the CPACR register. The EPU performs both scalar floating-point operations and Helium (also known as the M-profile Vector Extensions, MVE) operations. The EPU is disabled on reset and must be enabled to demonstrate vector instructions. The application also demonstrates the use of inline assembly to execute vector instructions. Finally, the application does a multiply-accumulate operation that uses low overhead loops and is vectorized by Arm Compiler 6. Here is the source code for the MLA function. Look at the disassembly file to see the instructions that Arm Compiler 6 generates.
Example of a multiply-accumulate (MLA) function
that can be auto-vectorized by Arm Compiler 6
The compiler can also generate low-overload loop
instructions for the loop within this function.
__attribute__((noinline)) int mla(short *a, short *b, int length)
int sum = 0;
for(i = 0; i < length; i++)
sum += a[i] * b[i];
A run script to execute the software using the Fixed Virtual Platform is also provided. This is a simple way to get familiar with software development with the Cortex-M55 processor.
Hello from Cortex-M55!
Sum is: 85344
Info: /OSCI/SystemC: Simulation stopped by user.
The Arm Compiler migration and compatibility guide aids the evaluation process by comparing the command line options, source code differences, assembly syntax, and other topics of interest.
For example, when changing from Cortex-M4 processor on Arm Compiler 5 to the Cortex-M55 processor on Arm Compiler 6, a few compiler command line option changes will be required:
Arm Compiler 5
Arm Compiler 6
--target= arm-arm-none-eabi –mcpu=cortex-m55
-Os / -Oz
-Onum (default is 2)
-Onum (default is 0)
The migration guide provides further details related to specific switches, but these are the basics to get going. Some compiler switches may need to be removed because they are specific to armcc and are not needed.
Both Development Studio 2020.0 (Bronze edition and upwards) and µVision from Keil MDK v5.30 have added Cortex-M55 support for software debugging. This includes disassembly and updated register views for new registers in Armv8.1-M.
Keil MDK is the most comprehensive software development environment for Cortex-M projects, while Development Studio can be used with any Arm IP.
The disassembly window in Development Studio shows the vector instructions in the previous example.
Similarly the µVision debugger also shows disassembly for vector instructions:
Development Studio and µVision show vector registers, including the Q registers (in whichever format best suits your needs) and the VPR (vector prediction status and control register).
Similarly, µVision has a specialized view of the Helium registers with configurable display:
Counting instructions is a great way to get a first estimate of how an algorithm will perform. Fast Models are fast, flexible programmer's view models of Arm IP, allowing you to develop software such as drivers, firmware, operating systems and applications prior to silicon availability. They allow full control over the simulation, including profiling, debug, and trace. Fast Models can be exported to SystemC, allowing integration into the wider SoC design process. The Cortex-M55 Fast Model provides a great way to learn the details of the new instructions without the need for hardware development boards.
Fast Models typical use cases:
The Fast Model for the Cortex-M55 processor is being released in March as part of Fast Models 11.10
Fast Models are ideal for building customized virtual platforms with memory maps and peripherals which match an SoC being designed. In addition to creating virtual platforms with Arm IP models, systems can be extended to include custom peripherals and other models using SystemC.
Refer to the Fast Models quick start on github for examples of how to create custom virtual platforms. An example for the Cortex-M55 processor will be added when Fast Models 11.10 is released.
For users who don’t require a custom virtual platform, Arm offers Fixed Virtual Platforms. FVPs are complete simulations of an Arm system, including the processor, memory, and peripherals. A Cortex-M55 FVP will be available in the Keil MDK and Development Studio versions noted above.
Cycle Models are compiled directly from Arm RTL and retain complete functional accuracy. This enables confident IP selection decisions based on SystemC simulation using benchmark software. Many projects select a CPU based on application specific software operations such as streaming small matrices or some other important part of an algorithm. Cycle Models are perfect to study the details of how many cycles are required for key software functions during the IP selection phase of a project and to compare performance of Arm IP. Cycle Models run in a SystemC simulator, including the Accellera reference simulator and simulators from EDA partners.
The Cortex-M55 SystemC Cycle Model supports several features which help with performance analysis:
The Cycle Model for the Cortex-M55 processor will be available on Arm IP Exchange in March 2020.
Additional resources for learning about the Cortex-M55 processor include a new white paper and the Armv8.1-M architecture reference manual.
The CMSIS library for DSP and ML applications has also been updated and optimized for best performance on the Cortex-M55 processor. Read this tutorial to learn how to implement classical machine learning techniques with CMSIS-DSP library!
Further to tools and software, to help you get started, you can also explore the training for Cortex-M.
A full suite of Arm development tools is available for the Cortex-M55 processor, which enable developers to get started with the latest and most advanced Cortex-M processor for ML. The Cortex-M55 processor brings a significant performance uplift for ML and DSP applications, transforming future generations of small, low-power IoT devices. Visit developer.arm.com for more information on Arm Development Tools.
Sign Up for Our Webinar to Learn More