Five things you may not know about Cortex-R Series processors

March 14, 2014

5 minute read time.

1) Cortex-R processor are widely used across many embedded applications

Often the Cortex-R Series are used in devices such as storage controller processors, LTE modems and industrial and automotive applications where the key attributes are needed:

- Fast: High processing performance at high clock frequencies
- Real-time: Deterministic processing always meets real-time constraints
- Reliable: Dependable with safety features and high error resistance

Cortex-R Series are not always as visible as the Cortex-A Series application processors or the Cortex-M microcontrollers, where the ARM brand adds value to our partners’ products and demonstrates there is a wide eco-system of engineers that have skills in programming them.

The safety features are especially important when implementing automotive and industrial embedded control systems where features such as memory protection, error-correcting codes and lock-step, using a redundant copy of the processor to detect errors, deliver high error resistance.

Many LTE modems use Cortex-R processor and in storage the Cortex-Rs are very popular. To date (3Q13) 900+ million devices have shipped that incorporate Cortex-R processors, proving the processors to be very mature and reliable.

2) Tightly Coupled Memory (TCM) for performance and determinism

TCM is memory connected closely to the processor core. This memory is very fast for the processor to access. Typically it will hold interrupt service routines and data tables that need to be accessed quickly. As soon as an interrupt arrives the Cortex-R processor can switch to interrupt privilege mode and quickly start working on the interrupt code that is held there. Without TCM if the interrupt service routine code, or any data it needed to access, was not held locally in the cache then the cache would need to fetch the code from main memory and this may take many clock cycles while the processor must wait until the code and data is available. With TCM then the worst case number of cycles to start running the interrupt code is known and hence the Cortex-R processors are deterministic.

Memory access above the dotted line the Cortex-R processor is always fast and deterministic

In a system with a Memory Management Unit then if the code or data is not available in the cache then a page table walk may be required and this could take hundred of cycles. TCM enables fast deterministic response to interrupts which makes the Cortex-R series ideal for real time systems and .

3) SIMD instructions and CMSIS-DSP Library functions add DSP capabilities

The Cortex-R Series provide native ability to do perform Single Instruction Multiple Data (SIMD) and Multiply and Accumulate (MAC) instructions. These enable multiple operations to be performed in a single clock cycle and includes saturating maths that clips rather than overflows results that are too large.

The CMSIS-DSP library is a collection of 61 algorithms that utilise the SIMD capabilities and include:

Basic maths: vector multiply, add, subtract, scale, shift, negate...
Statistics: root mean square, mean, standard deviation...
Fast maths: sine, cosine, square root...
Complex maths: conjugate, dot product, magnitude, multiply by real...
Filters: FIR, IIR, convolution, correlation..
Matrix algebra: addition, multiplication, scale...
Transforms: Fast Fourier, discrete cosine...
Controller: PID motor control, (Inverse)Park transform, (Inverse)Clarke transform...
Interpolation: linear and bilinear...
Support functions: type conversion, copy, fill...

By including these capabilities in the processor a much simpler, more cost-effective and easier to debug system can be created than by having a separate DSP. The performance and width of SIMD data processed is not as advanced as some of the very high-end standalone DSPs but in many applications, use of these capabilities can make the system more efficient and lower power.

Example motor control application where Park and Clarke transforms are handled by the SIMD/DSP capabilities through the CMSIS-DSP library

4) Branch shadow and branch prediction

The Cortex-R Series enhance performance through advanced branch prediction techniques. In a pipelined processor multiple actions happen in each clock cycle. In Cortex-R, both instruction fetch and data read/write access are extended to two cycles allowing longer memory access time, enabling either larger memories or slower memories that can be denser or lower power. This removes memory system limitations on processor clock frequency. Plus another additional decode stage that accommodates branch prediction (conditionals, loops and function returns) and an instruction queue to keep the data processing unit fed with instructions. If a branch happens without prediction then the processor must stall and wait until the pipeline is reloaded with instructions from the new address to refill the pipeline and reach the data processing unit. Branch prediction determines the most likely outcome of any branch instruction and either continues as normal, if it predicts the branch will not be taken, or starts loading the pipeline with the instructions from the branch address so that the data processing unit will not stalled. Branch prediction can significantly improve the performance of processors. The Cortex-R7 approaches 100% branch prediction accuracy compared to ~80% for Cortex-R4/R5.

5) Error Correcting Code (ECC) generation/checking is built into the processor pipeline

ECC is a method of checking that the memory location data is correct and has not been corrupted. If a single bit error is detected then it can be automatically corrected and written back to the memory location. The memory has additional bits added and a code is generated and stored in these additional bits whenever information is written to memory. When the memory is read back the code is checked to ensure the data and code still match. This could be the case if there has been a Single Event Upset (SEU) such as radiation hitting the memory location and flipping the bit, or if there is a physical error in the memory. In the Cortex-R Series the ECC code generation and checking is done automatically and does not cause any performance impact, unless of course and error is detected. EEC is an optional feature on all of the Cortex-R Series.

Example of ECC on TCM as part of the Cortex-R Series pipeline

1 comment
0 members are here

Architectures and Processors blog

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025
Caches and Self-Modifying Code: Working with Threads

Jacob Bramley

How to synchronize JIT-compiled instructions across threads.
- January 21, 2025
Caches and Self-Modifying Code: Implementing `__clear_cache`

Jacob Bramley

How to implement `__clear_cache` using assembly.
- January 20, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Five things you may not know about Cortex-R Series processors

1) Cortex-R processor are widely used across many embedded applications

2) Tightly Coupled Memory (TCM) for performance and determinism

3) SIMD instructions and CMSIS-DSP Library functions add DSP capabilities

4) Branch shadow and branch prediction

5) Error Correcting Code (ECC) generation/checking is built into the processor pipeline

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Caches and Self-Modifying Code: Working with Threads

Caches and Self-Modifying Code: Implementing `__clear_cache`