Faced with more demanding compute requirements, Cortex-M microcontroller system developers are faced with a choice: optimizing software to squeeze more processing per clock cycle from their current microcontroller, or migrate their code base to a different, higher-performing microprocessor class. The Cortex-M microcontroller offers many benefits, such as determinism, short interrupt latencies, and advanced low-power management modes. The choice of moving to a different microprocessor class, say a Cortex-A based microprocessor, means that some of those wanted Cortex-M benefits are forfeited.
Recently, Cortex-M microcontroller vendors have been able to offer higher-performing Cortex-M microcontrollers. This allows system designers to easily migrate from a, say, 150MHz Cortex-M4 or Cortex-M33 device to Cortex-M7 devices clocked at over 600MHz. Benefits inherent to Cortex-M, such as support from the extensive ecosystem of tools and software are maintained. But what happens when even more processing performance is required?
Today, Arm announces the Cortex-M85, the highest performing Cortex-M processor delivering unprecedented performance levels across the board. It is the first Cortex-M to deliver over 6 CoreMarks/MHz and more than 3 DMIPS/MHz. This level of scalar performance is achieved thanks to many innovative features at the microarchitecture level, including optimized dual issue and selective triple issue capability, improved branch prediction, and an enhanced memory system including data prefetching, among others.
By integrating Arm Helium technology, Cortex-M85 delivers multiple folds (x4) of DSP and ML processing uplift compared to its predecessor, the Cortex-M7. It also brings approximately 20% vector processing performance uplift compared to the other Helium-enabled processor, Cortex-M55. Again, microarchitecture innovations have boosted Cortex-M85 to these unprecedented performance levels.
With high data processing rates, Cortex-M85 adopts a more advanced memory system architecture to ensure higher data and code throughput. A low latency memory system with Tightly Coupled Memories (TCMs) ensures deterministic operation. Four 32-bit wide data TCM interfaces and one 64-bit wide instruction TCM interface – all with integrated Error Correcting Code (ECC) - are available to SoC designers. An additional 32-bit AHB access interface port allows an external DMA controller – check out the CoreLink-DMA-350 - to access TCMs concurrently as the Cortex-M85 processor internal core, thereby enabling many common data streaming and processing use-cases.
A level 1 cache system, again, with ECC, connected to external memories with an AMBA 5 AXI main interface optimizes performance when slower, non-deterministic memory accesses are required.
Deterministic compute underpins Cortex-M85-based processor’s value proposition to system designers. As intelligence gravitates towards the endpoint of IoT, those systems that must sense-decide-actuate-communicate within a predictable time span rely on Cortex-M85’s ability to deliver unprecedented performance "on time". Autonomous utility robots, agricultural drones, industrial human-machine interface, are a few of many use-case examples.
Key to any IoT or embedded system is security against malicious or unintentional exposure of confidential data. Cortex-M85 brings TrustZone for Armv8-M to the highest performance tier of Cortex-M processors. Additionally, Cortex-M85 is the first Cortex-M processor to integrate the new Armv8.1-M pointer authentication and branch target identification extension (PACBTI), which eases developers' journey to achieving PSA Certified Level 2 security. PACBTI brings additional protection against return-oriented and jump-oriented software attacks by authenticating function call and return addresses.
The Corstone-310 subsystem information page integrates Cortex-M85 and the Ethos-U55 uNPU. Corstone subsystems bring together all of the elements our hardware partners need to be able to quickly and simply implement Arm technology and get to tape out quicker. All they need to focus on is adding their own differentiation. The Corstone-310 subsystem addresses some design challenges SoC designers face by providing an example of system-level security and power control. The integration of the key CPU and microNPU IP with many system IP components, including power control kit (PCK-600), secure debug component (SDC-600), security-aware system IP components of the SIE-200, all provide a significant jumpstart for SoC designers.
Even before silicon based on Cortex-M85 become available, you can start software development with Arm Virtual Hardware (AVH). Arm Virtual Hardware delivers models of Arm-based processors, systems, and development boards, including Corstone-310 to enable fast prototyping, development, and deployment. AVH Corstone allows seamless software transfer from model to target hardware to enable continuous integration and continuous delivery (CI/CD) environments.
The new class of high-performance microcontrollers powered by Cortex-M85 is different from traditional microcontrollers today. They have larger on-chip SRAMs, clocked at higher frequencies, and implemented in smaller geometry nodes. Achieving the optimal implementation of Cortex-M85 is non-trivial, especially for silicon designers implementing in a new technology node for the first time. To reduce design cycle time and achieve a predetermined PPA target, Arm provides a Cortex-M85 PIK capturing a set of best practices for TSMC 22ULL foundry process that includes a user guide, shmoo plots, floorplans, and reference implementation scripts in one package.
The unprecedented performance levels offered by the Cortex-M85 offers new possibilities to microcontroller developers to develop and deploy ever more demanding use-cases on the Cortex-M. Developers benefit from the simple programmer’s model, processing determinism, and low-power schemes, all of which are hallmark characteristics to all Arm Cortex-M processors.
Learn more about Cortex-M85
Hi Tim,
Thanks for the detailed explanation.
One question to clarify with you regarding the performance graph. Which traditional and ML benchmarking programs such as Cormark/Drystone were used for this comparison?
Thanks