Arm Helium technology is the M-profile Vector Extension (MVE) for the Arm Cortex-M processor series. It delivers a significant performance uplift for Machine Learning (ML) and Digital Signal Processing (DSP) applications for small, embedded devices. Helium helps to overcome compute challenges in many applications, such as audio devices, sensor hubs, keyword spotting and voice command control, power electronics, communications, and till image processing. At the time of writing, three Arm processors support Helium: Cortex-M52, Cortex-M55, and Cortex-M85.
My colleagues have written many blogs about Helium and how it works in practice. If you want to learn more, I can recommend this series: Making Helium and the comprehensive overview over the available material: Getting started with Armv8.1-M based processor.
One thing is the power that a new architecture feature brings to the market, the other thing is how to make use of this power. For many years, software developers in the Cortex-M space had a couple of toolchain choices. They could use Arm’s compiler and other commercial toolchains or use the open-source GNU toolchain (GCC). Often, Arm’s silicon partners ship their IDEs with GCC. Therefore, partners implementing Helium in their next generation devices were asking: “And what about Helium support in GCC?”
As our own Arm Compiler for Embedded (ACfE) is based on the LLVM toolchain, we have focused on adding excellent auto-vectorization features and general Helium support to LLVM first. We have not kept this secret but have upstreamed our work to the LLVM project. Thus, partners and customers looking for a free toolchain should take a look at LLVM when working with Helium-enabled devices (or use our toolchain of course). We are working on enhancing the Helium support in GCC as well, but this takes time. It is also likely that GCC will never reach the same maturity for Helium that LLVM has.
We have built and run different benchmarking applications and example projects using different compilers. For example, the AudioMark benchmark shows the following result on an Arm Cortex-M55:
The AudioMark score shows a 15% better performance on ACfE 6.21 compared to GCC13. LLVM17’s score is only 6% lower than ACfE 6.21.
Real-life examples such as a keyword spotting example show a similar result:
In this chart, smaller bars are better, as the inference consumes less cycles. GCC13 takes almost 16% more cycles than ACfE 6.21. LLVM17 is 5% slower.
Finally, a similar result for object detection:
GCC13 is 22% slower than ACfE 6.21, while LLVM17 is quite close with only 3% more cycles.
When we introduced Helium, we concentrated optimizing toolchains based on LLVM as this is the technical foundation for our own compiler toolchain. Therefore, in summary, if you are interested in getting the best performance from an Arm core with Helium, use Arm Compiler for Embedded or the free LLVM toolchain.
We will work on improving the numbers for GCC, but it will take time and maybe never get as good as LLVM. For microcontrollers, LLVM is a viable and future-proof alternative to GCC.