Often the Cortex-R Series are used in devices such as storage controller processors, LTE modems and industrial and automotive applications where the key attributes are needed:
Cortex-R Series are not always as visible as the Cortex-A Series application processors or the Cortex-M microcontrollers, where the ARM brand adds value to our partners’ products and demonstrates there is a wide eco-system of engineers that have skills in programming them.
The safety features are especially important when implementing automotive and industrial embedded control systems where features such as memory protection, error-correcting codes and lock-step, using a redundant copy of the processor to detect errors, deliver high error resistance.
Many LTE modems use Cortex-R processor and in storage the Cortex-Rs are very popular. To date (3Q13) 900+ million devices have shipped that incorporate Cortex-R processors, proving the processors to be very mature and reliable.
TCM is memory connected closely to the processor core. This memory is very fast for the processor to access. Typically it will hold interrupt service routines and data tables that need to be accessed quickly. As soon as an interrupt arrives the Cortex-R processor can switch to interrupt privilege mode and quickly start working on the interrupt code that is held there. Without TCM if the interrupt service routine code, or any data it needed to access, was not held locally in the cache then the cache would need to fetch the code from main memory and this may take many clock cycles while the processor must wait until the code and data is available. With TCM then the worst case number of cycles to start running the interrupt code is known and hence the Cortex-R processors are deterministic.
Memory access above the dotted line the Cortex-R processor is always fast and deterministic
In a system with a Memory Management Unit then if the code or data is not available in the cache then a page table walk may be required and this could take hundred of cycles. TCM enables fast deterministic response to interrupts which makes the Cortex-R series ideal for real time systems and .
The Cortex-R Series provide native ability to do perform Single Instruction Multiple Data (SIMD) and Multiply and Accumulate (MAC) instructions. These enable multiple operations to be performed in a single clock cycle and includes saturating maths that clips rather than overflows results that are too large.
The CMSIS-DSP library is a collection of 61 algorithms that utilise the SIMD capabilities and include:
By including these capabilities in the processor a much simpler, more cost-effective and easier to debug system can be created than by having a separate DSP. The performance and width of SIMD data processed is not as advanced as some of the very high-end standalone DSPs but in many applications, use of these capabilities can make the system more efficient and lower power.
Example motor control application where Park and Clarke transforms are handled by the SIMD/DSP capabilities through the CMSIS-DSP library
The Cortex-R Series enhance performance through advanced branch prediction techniques. In a pipelined processor multiple actions happen in each clock cycle. In Cortex-R, both instruction fetch and data read/write access are extended to two cycles allowing longer memory access time, enabling either larger memories or slower memories that can be denser or lower power. This removes memory system limitations on processor clock frequency. Plus another additional decode stage that accommodates branch prediction (conditionals, loops and function returns) and an instruction queue to keep the data processing unit fed with instructions. If a branch happens without prediction then the processor must stall and wait until the pipeline is reloaded with instructions from the new address to refill the pipeline and reach the data processing unit. Branch prediction determines the most likely outcome of any branch instruction and either continues as normal, if it predicts the branch will not be taken, or starts loading the pipeline with the instructions from the branch address so that the data processing unit will not stalled. Branch prediction can significantly improve the performance of processors. The Cortex-R7 approaches 100% branch prediction accuracy compared to ~80% for Cortex-R4/R5.
ECC is a method of checking that the memory location data is correct and has not been corrupted. If a single bit error is detected then it can be automatically corrected and written back to the memory location. The memory has additional bits added and a code is generated and stored in these additional bits whenever information is written to memory. When the memory is read back the code is checked to ensure the data and code still match. This could be the case if there has been a Single Event Upset (SEU) such as radiation hitting the memory location and flipping the bit, or if there is a physical error in the memory. In the Cortex-R Series the ECC code generation and checking is done automatically and does not cause any performance impact, unless of course and error is detected. EEC is an optional feature on all of the Cortex-R Series.
Example of ECC on TCM as part of the Cortex-R Series pipeline
as far as i see Cmsis DSP library wrote in single precision, is there any way to calculate in double precision?