In recent articles, I’ve overviewed the ARMv7 architecture and then looked in more detail at ARMv7-A (ARMv7-A - Power to the People) and ARMv6-M/ARMv7-M (ARMv6-M vs ARMv7-M - Unpacking the Microcontrollers). You will recall that there are three profiles in ARMv7. That covers two of them. But what of the third – ARMv7-R?
ARMv 7-A, ARMv7-M (and ARMv6-M) grab the headlines because we see them everywhere – the former in mobile consumer electronics such as smartphones and tablets, the latter in microcontrollers. ARMv7-R, on the other hand, often seems to lurk in the shadows.
It occupies a highly specialised niche in the electronics industry, a place where the key requirements are things like deterministic real-time scheduling, guaranteed low interrupt latency, and safety-critical operation. A niche where properties like this are not nice-to-haves, but absolute requirements often imposed by regulatory regimes and the need for certified end-products. We’re talking here about applications like vehicle chassis systems and hard disk drive controllers. Safety-critical operation is a common feature.
ARM has long recognised the need for versions of its processors which are optimised for high-performance deterministic real-time. The key requirements here are:
ARM introduced the ARM946E-and S ARM966E-S processors in the late 1990’s to address these requirements. Both were derived from the same ARM9E core as the popular ARM926EJ-S. Crucially, neither supported caches or virtual memory. These two components are the main offenders when it comes to indeterminacy in instruction execution. Virtual memory (implemented on ARM cores via a Memory Management Unit, or MMU) permits the translation of virtual addresses, issued by the processor, to physical addresses which are issued to the memory system. While a vital component in systems which are capable of running a rich OS, such as Linux, they introduce non-deterministic and unpredictable memory access latencies which are not acceptable in hard real-time systems. Likewise, caches, which are very helpful in accelerating both instruction fetches and data memory accesses, introduce an unpredictable variability in memory access times which is not compatible with hard real-time requirements.
The ARM966E-S and ARM946E-S introduced the concept of Tightly Coupled Memories (TCM) – regions of fast RAM coupled to the core by dedicated interfaces and designed for fast (typically single-cycle) memory accesses. Unlike cache, TCM requires explicit management in software to load specific data and instruction segments (such as critical data and interrupt service routines) into TCM from where they can be accessed swiftly and reliably when required.
The ARM946E-S implemented a Memory Protection Unit (MPU) which allows software to partition code and data into memory regions with configurable access permissions. This is a basic requirement for safety-critical systems.
The ARM1156T2(F)-S, as well as being the first processor to implement the Thumb-2 instruction set extension technology, was the next processor to target this area. It featured dedicated TCM interfaces (three separate ports over the two offered by the ARM9E variants), a dedicated DMA access port and hardware division. The microarchitecture was based around a highly efficient 9-stage dual-issue instruction pipeline. Like the earlier processors, it incorporated an MPU. The optional Floating Point Unit (FPU) provided floating point acceleration for the most demanding applications.
The advent of the ARMv7 architecture, and its architecture profiles, included the definition of the ARMv7-R architecture, targeted specifically at these hard real-time applications. The first processor to implement this was the Cortex-R4, released in 2005. The Cortex-R4 was a natural evolution of the ARM1156T2(F)-S but, at the same time, was a huge leap forward in capability.
The Cortex-R4 found a ready market in automotive systems, industrial control, wireless baseband, hard disk drive controllers and many more real-time applications.
ARMv7-R is very closely related to ARMv7-A. The key requirements addressed by ARMv7-R are:
Low Latency Interrupt mode is a software-configurable option which reduces interrupt latency. Individual implementations are free to use a variety of strategies to achieve this. Among the most common is to allow certain lengthy instructions to be interrupted. The most common instructions involved are the LDM/STM instructions which can take many cycles to load or store multiple registers. Other implementations may, for example, disable Hit-Under-Miss support in caches. Specific microarchitectural optimizations also allow Cortex-R processors to take interrupts into the pipeline more quickly.
ARMv7-R does not support virtual memory, so does not include an MMU. Instead, it supports what is called the Protected Memory System Architecture (PMSA). This is typically implemented via an MPU. This allows memory to be partitioned (separately on the code and data sides) into regions which have configurable protection attributes. These attributes are policed by the MPU. The number of regions supported was increased in the Cortex-R5.
ARMv7-R supports the full ARM and Thumb instruction sets, including the Thumb-2 extensions. It also includes provision for hardware division support. Hardware floating point is an optional extension (see below).
The endian support mechanism used by ARMv7 processors generally affects only the data memory interface, with the instruction interface being fixed as little-endian. To support legacy software which may have been built using big-endian instructions, the ARMv7-R profile offers the ability to set the endianness of the instruction interface at reset-time. This is a hardware-controlled option which cannot be changed after reset. Note that not all Cortex-R processors implement this.
ARMv7-R processors can be configured to generate an Undefined Instruction exception on an attempt to divide by zero (on ARMv7-A cores, the operation always returns zero without an exception).
Processors supporting ARMv7-R may optionally include the Floating-point (VFP) extension. This provides an extended register set and a dedicated set of floating point instructions, supporting single-precision and double-precision operations. Half-precision support is optional.
The Advanced SIMD Extension can be implemented either stand-alone or in combination with the Floating-point Extension mentioned above. The NEON instruction set provides a very powerful vector processing capability which shares the same register set as the FPU. All current Cortex-R processors support the Advanced SIMD extension.
The ARMv7 Multiprocessing Extensions provide enhanced support for multiprocessor implementations, including extending and modifying memory system maintenance operations to multiple memory levels and multiple processors. An addition system register (the Multiprocessor Affinity Register) allows software to identify individual processors in a multiprocessor system. Processors which support multiprocessing include a Snoop Control Unit (SCU) which maintains data coherency in the L1 memory system. Cortex-R5 and Cortex-R7 also include an Accelerator Coherency port (ACP) which permits external intelligent peripherals (e.g. DMA controllers) to participate in the coherent system.
The Performance Monitors Extension (PMU) offers a set of configurable event counters which are of immense use when debugging and benchmarking the performance of a system. While officially an optional extension to the ARMv7-R architecture, ARM strongly recommends that all implementations include the Performance Monitors Extension. To date, all ARM implementations have included this and it is an extremely valuable feature when investigating real-time behavior.
The following cores in the Cortex-R series support the ARMv7-R architecture. Together, they provide a scalable range of power-efficient performance points for their target applications.
For more detail on the Cortex-R series, please look at the product pages on ARM's website.
Thank you for the blog containing the differences between Cortex-R and Cortex-A/M along with various revisions of Cortex-R.