System-on-chip (SoC) solutions based on Arm Cortex processors address diverse embedded market segments, including: Internet of Things, motor control, healthcare, automotive, home automation, and many more, as you can see in this blog by Thomas Ensergueix. The various processors provide a standard architecture to address the broad performance spectrum and cost range required by these diverse product markets. The Cortex family is based on three distinct profiles: the A profile, for sophisticated, high-end applications running mainly complex operating systems; the R profile, for high performance hard real-time systems; and the M profile, optimised for low-power, deterministic, cost-sensitive microcontroller applications.
The first two processors implemented using the Armv8-M architecture are the Cortex-M23 and the Cortex-M33. For details on the Cortex-M23, please refer to this blog by Tim Menasveta. The Cortex-M33 is the first full-feature implementation of Armv8-M with TrustZone security technology and digital signal processing capability. The processor supports a large number of flexible configuration options to facilitate deployment in a wide range of applications, and offers a dedicated co-processor interface for accelerating frequently used, compute intensive operations. The Cortex-M33 delivers an optimal balance between performance, power, security and productivity.
The Cortex-M33 processor has an in-order 3-stage pipeline, which dramatically reduces system power consumption. Most instructions complete in two stages, while more complex instructions require three. Some 16-bit instructions are dual-issued to boost performance. The core has two AMBA 5 AHB5 interfaces: C-AHB and S-AHB, which are symmetric in nature and offer identical performance of instruction and data fetches.
Designers can quickly create powerful systems by including the most suitable combination of these optional MPU, DSP, FPU, TrustZone, ETM, MTB, ITM, BPU, DWT and co-processor interface features. In minimal control systems, the NVIC can be configured to have just one external interrupt, while in peripheral rich systems, the NVIC can be configured to support up to 480 external interrupts with up to 256 levels of priorities. In systems demanding more reliable operations of many active processes and threads, the MPU can be included to enforce process separation using privileged and unprivileged access control. For the next level of code, data and resource protection, TrustZone would be used.
Increasing complexity of applications make on-chip debug and trace invaluable to delivering products on schedule. The integrated debug capabilities of the Cortex-M33 processor allow for faster software verification. The system can be viewed through either a JTAG port or a 2-pin Serial Wire Debug port. The optional ETM and MTB provide excellent instruction trace capabilities, while the BPU and DWT provide the capability to use breakpoints and hardware watchpoints for debug.
Now, on to the five key features of the Cortex-M33:
The Cortex-M33 processor with TrustZone has two security states and a number of associated features:
Two new orthogonal states
The presence of two full states opens the door for many new opportunities and applications. High value proprietary firmware used by the system may be delivered in the secure state. Supervisor code placed in the secure state can be used to recover a system after an attack or unreliable operation, while the non-secure side remains available as before to the millions of developers currently developing software for Cortex-M.
For certain applications, special-purpose compute can make a difference. It is essential that this is done in a way that maintains all of the benefits of the world's #1 ecosystem – the widest choice of development tools, compilers, debuggers, operating systems, and middleware. The ARM ecosystem saves developers time, cost and increases productivity.
The Cortex-M33 processor includes an optional dedicated bus-like interface for the integration of tightly-coupled accelerator hardware. For frequently used compute intensive operations, this interface gives a mechanism to augment the general purpose compute capability with custom defined processing hardware. Crucially, it does this without fragmenting the ecosystem. The interface includes control and data channels for up to eight co-processors, with signals to provide information about the privilege and security state of the processor along with the instruction type, associated register and operation fields. The co-processor operations are typically expected to either complete in a reasonable small number of cycles or running in background and interrupt on completion. The operation details and data can be transferred via the interface at the same time with a single instruction, and wait states can be inserted if needed.
The optional MPU is programmable and provides up to 16 regions for each of the secure and non-secure states. In multi-tasking environments, the OS can reprogram the MPU during task context switching to define the memory access permissions for each task. For example, a task of an application may be granted access to only some application data and specific peripherals. In this way, the MPU protects all other memories and peripherals from corruption or unauthorised access to dramatically improve system reliability.
The Cortex-M33 memory protection architecture is based on the protected memory system architecture PMSAv8. This version adopts base and limit style comparators for regions as opposed to the previous power-of-two size, sized-aligned scheme. Each region has a base starting address, ending address, and settings for access permission and memory attribute. The result is that one can produce MPU regions without having to consider joining a number of regions together. This enhancement simplifies software development, encourages usage and reduces programming steps, which reduces context switch times.
The optional integer DSP extension adds 85 instructions. In most cases, the DSP instructions would increase performance by an average of three times, giving a boost to all applications that are centred around digital signal control.
To accelerate software development, Arm also deliver a free DSP library in the CMSIS project. The library contains a range of filter, transformation and maths functions (e.g. matrix), and support a range of data types. The CMSIS project is now open source and the development is published in github.
The optional single precision floating point extension based on FPv5 includes an additional 16-entry 64-bit register file. The option adds 45 IEEE754-2008 compatible single-precision floating-point instructions. Using floating-point instructions usually yields an average of ten times increase in performance over the equivalent software libraries.
The FPU is contained in a separate power domain allowing the unit to be powered-down when not enabled or in use.
The complexity of Embedded solutions is rising dramatically and so is their value. Designers are faced with the task of finding the right balance between opposing design factors. The amount of software included in an SoC is also rising dramatically, while project schedules are shrinking. In order to deliver the right product, at the right time, with the right performance and cost, we need to start with the correct seed.
The Cortex-M33 processor was designed to be the seed of such designs, leveraging previous experience and the existing Cortex-M ecosystem to reduce development cost. System power is reduced, due to a new design with multiple low-power technologies. TrustZone sets the foundation to protect user applications and valuable IP for building secure solutions. The enhanced MPU and TrustZone combine to form the base for reliable and protected systems. Finally, we get to the endless pursuit of better productivity. TrustZone is designed such that all existing users may continue to develop in the non-secure zone, just as before. Debug and trace are enhanced in Cortex-M33 to simplify working with complex code. All programming may be done in C language, as is the case for all Cortex-M, including all exception handlers. In total, these features and functionality increase developer productivity and allow them to deliver more complex solutions in a shorter period.
Many silicon partners joined Arm in defining and developing these new processors, and are actively designing chips taking advantage of the TrustZone security technology. The Arm ecosystem is also focused on porting tools and software to the Cortex-M33. While the Cortex-M33 delivers an optimal balance between performance, power, security and productivity, it is more important to state that the Arm partnership is working hard to deliver great ingredients to the developers and makers whose creativity and vision will fuel the fast transition to a more connected, more intelligent, and more protected world.
For further details, please check out the white paper below:
In above article you've mentioned, The optional ETM and MTB will provide excellent trace capabilities.
But in arm_v8M technical reference manual, I read in page no. 284,
The following optional debug components are not part of the Armv8-M architecture:• The Cross-Trigger Interface (CTI).• The CoreSight basic trace router (MTB).• The Embedded Trace Macrocell (ETM).
So, can you please confirm which one is right?
Hello Diya,
thank you!
Best regards,
Yasuhiko Koumoto.
Hello,
The FPU provides an extension register file containing 32 single-precision registers.
The registers can be viewed as:
• Thirty-two 32-bit single-word registers, S0-S31.
• Sixteen 64-bit doubleword registers, D0-D15.
• A combination of registers from these views.
which one to use depends on the needs of the algorithm being coded
regards
Diya
Hi Diya Soubra,
it is very interesting.
By the way, I have a question.
Although Cortex-M33 supports only single precision floating point operations, why is the register file width 64bit? Can it be used as 2 way SIMD?
Yasuhiko Koumoto,