(Editor's note: This article was originally published February 2015 in RTC Magazine and has been updated with new product information. The original article can be found here: RTC Magazine by RTC Group - issuu )
The ARM Cortex-M processor family is a range of scalable and compatible, energy-efficient, easy-to-use processors designed to help developers meet the needs of tomorrow’s smart and connected embedded applications. The Cortex-M4, unveiled in 2010, built on the Cortex-M3 foundation with a range of instruction set extensions explicitly tailored for digital signal processing, along with an optional single-precision floating-point unit delivering 1.25DMIPS/MHz. Since its launch, over 10 semiconductor vendors have introduced Cortex-M4 based general MCU products along with a wide range of sensor hub products based on Cortex-M4.
Over the past few years, the capability and processing needs of connected embedded systems has become more demanding, with even the simplest of systems expected to have multiple connectivity options as well as a graphical user interface, HMI, audio recognition or indeed other natural ways of interaction. Processors need to become more capable and offer more local processing capabilities. Microcontrollers in automotive and industrial automation applications need to support higher processing and demand a CPU performance uplift. Industrial plants require an increasing amount of precision and operate on large amounts of data in a short space of time. The demands of these future systems include delivering more features at a lower cost, increasing connectivity, better code reuse and improved energy efficiency. It is with this future in mind that ARM along with its partners designed the ARM Cortex-M7 processor, the highest performance member of the Cortex-M family.
Doubling the performance of the Cortex-M4 and delivering 5 CoreMark/MHz, the Cortex-M7 is designed to address the more demanding applications and remove the barrier that previously faced Cortex-M CPU based solutions. Cortex-M7 is designed for a wide range of embedded applications including microcontrollers, automotive controllers, industrial control systems and wireless communication controllers (e.g. Wi-Fi). For those who are familiar with the wide range of Cortex-M family CPUs available for embedded applications, Cortex-M7 is based on the ARMv7-M architecture and brings architectural compatibility all the way from Cortex-M0.
The Cortex-M7 sports a six-stage superscalar pipeline and provides integer, floating point and DSP performance along with tightly coupled memories, caches and options to enable larger memory systems while handling deterministic behavior. The advanced pipeline relative to Cortex-M4 enables greater performance, allowing the Cortex-M7 to execute up to two instructions per clock cycle.
A large focus of the development of the Cortex-M7 was on improving the instructions-per-clock (IPC) efficiency relative to earlier Cortex-M family processors. Cortex-M7 is the first Cortex-M profile processor to integrate both the option of instruction and data caches of up to 64KB each. The cache enables efficient operation with a larger memory system (which is typically slower than the processor). Additional support for tightly coupled interfaces to memory arrays is integrated with support for custom Error Correction Code (ECC) implementation for each of the tightly coupled memory interfaces so that fast access to memory enables time-critical interrupt handling and real-time application tasks. This integration allows engineers to execute a large proportion of code from the internal cache to reduce the number of read and write occurrences from the external memory, leading to power savings.
Cortex-M7 also offers application engineers the option of ECC support for each of the cache memories hence enhancing the reliability of the system. For a given solution, if a memory location is corrupted with a single bit error, the data can be corrected and restored. In addition to the ECC, the memory system can also be enhanced through the optional Memory Protection Unit (MPU) with 8 or 16 regions for better system reliability.
The memory system has also been advanced to support the increased CPU capabilities with a 64-bit AXI bus interface offering greater bandwidth than the 32-bit AHB and allowing multiple outstanding transfers for maximum bus system performance. For easy integration with legacy peripherals used in previous Cortex-M designs, there is an optional low-latency AHB peripheral bus interface. To allow flexible interrupt management and low interrupt latency, the Integrated Nested Vectored Interrupt Controller (NVIC) with 1 to 240 interrupts, and with 3 to 8-bit programmable priority level registers is closely integrated with the processor. There is also support for ETM, designed for use with CoreSight, ARM’s extensible, system-wide debug and trace architecture.
Cortex-M7 further expands the family’s floating-point facilities to include a double-precision option; the simultaneous issue of integer and floating point instructions is also now supported when the FPU is present. Given the range of applications that the Cortex-M7 based MCUs enables, it is fully supported with powerful debug features, with optional full instruction and data trace. These features make the processor an attractive solution for applications requiring a performance upgrade on devices already using the Cortex-M4 processor.
Given most embedded engineers and developers are familiar with Cortex-M4, let’s look at some of the software development benefits Cortex-M7 brings. From a developer’s perspective, the Cortex-M7 supports all the instructions available on the Cortex-M4 processor, and uses the same exception model for interrupt handling. In most cases, program code written for Cortex-M4 processor should run on the Cortex-M7 processors without any problem. However, there are a few cases where changes may be needed, and software developers must understand these to reduce the time required when migrating applications from Cortex-M4 to the Cortex-M7 processors.
In order to get the best performance out of the Cortex-M7 processor, a number of C compilers and their runtime libraries have been optimized and updated (Figure 2). In addition, a number of changes in the debug system for the Cortex-M7 processor compared to Cortex-M4 mean that software developers must update their tool chains to newer versions in order to debug applications on Cortex-M7 based microcontroller products. In a few cases the firmware on the debug adapter might also need an update. As a result, updating to the latest development tool chain is strongly recommended.
Typically the following changes should be done when migrating software from the Cortex-M4 to the Cortex-M7 processor:
In addition, all code should be recompiled in order to allow the compiler to optimize the instruction sequencing better for the Cortex-M7 processor pipeline. In some cases, additional cache maintenance operations might be needed during runtime. For example when a cacheable memory location is shared between the processor and a separate bus master such as a DMA controller:
The Cortex-M7 processor supports several floating point support options, which allow for no FPU, single precision FPU and for single and double precision FPU. If the application can benefit from the double precision floating point unit support, the application should be updated and recompiled to make use of the double precision FPU. Even if the application uses only single precision floating point operations, recompiling the code for the Cortex-M7 processor can also be beneficial because the FPU in the Cortex-M7 is based on FPv5, whereas the FPU in the Cortex-M4 processor is FPv4. The FPv5 has additional floating point processing instructions, which might help speed up the floating point data processing in the target application.
There are a number of potential areas in the program code that might need changes. Due to the higher performance of the processor, some program code might need adjusting due to the faster execution. This is most common for applications that use hard coded timing delay loops.
System memory maps often change when migrating from one microcontroller device to another. Also, in the Cortex-M7 processor the initial vector table does not necessary start at address 0x00000000. If application code assumes initial vector table as address 0, users might need to update the code so that it determines the initial vector table location from the Vector Table Offset Register.
Due to the multiple bus interfaces and more capable write buffers in the Cortex-M7 processor, users might find it necessary to insert additional memory barrier instructions in the program code. The guide line for memory barrier usage is documented in ARM application note AN321 – ARM Cortex-M Programming Guide to Memory Barrier Instructions. In the Cortex-M4 processor, due to the simple nature of the pipeline, omitting the memory barriers does not usually cause any issue. In the Cortex-M7 processor the memory barrier requirements are stricter.
Not only does Cortex-M7 inherit the characteristics from the Cortex-M processor series, such as energy efficiency, high performance, ease of use and smaller code, but it is also designed with exceptional memory and connectivity options for design flexibility making it especially suited for the automotive, IoT and industrial connectivity markets. Announcements of Cortex-M7 based MCUs have followed soon after launch of the processor itself with the following:
Given that there are many architectural similarities between the ARM Cortex-M4 and Cortex-M7 processors, and that ensuring that the majority of application code is directly ready for migration, software developers can get started to ensure their applications are suited for the next generation of embedded connected intelligence. Migration requires some adaption and changes to be made by the user. Developers can follow up the migration process in further details with the white paper titled “Migrating Applications from an ARM Cortex-M4 Processor to a Cortex-M7 Processor - A Software Developer’s Guide” on the ARM Connected Community and assess in-depth technical discussions.
Did you see the email exchanges between me and RTC? (I used your email from your profile).
I'm interested in translating other publications. All have the same restriction? What should I do if other?
I tried to send you a private message, but we are not connected.
Thankful for the response.
Thanks for your interested in this article. Since this article was published with RTC and we reposted it here with permission from RTC, I will need to double check if you will need permission. I will drop them a email and cc you.
I would like to translate this article for Portugesa language and post on my blog, keeping clear the appropriate references and notes informing the original text and the author.
I have done some translations in order to study and give access to my colleagues and students native of the Portuguese language.