The flexible approach to adding Functional Safety to a CPU

November 8, 2022

Functional safety has become both increasingly important and prevalent across various markets, none more so than the automotive and industrial sectors.

Previously, within the automotive industry, functional safety was reserved for a number of critical functions implemented in a few Electronic Control Units (ECUs) in a vehicle. This approach has increasingly changed as the number of automotive applications with safety requirements have grown, such as the continued deployment of driver assistance innovations and autonomy. To learn more about functional safety in the automotive sector read our Arm Community blog here.

The industrial sector’s evolving transformation towards smart manufacturing has seen the rise of automation and the growing adoption of collaborative robots (co-bots) working alongside humans on the production line. As human and machines continue to work in such close proximity, control systems must constantly monitor the integrity of the robot and react appropriately under error conditions. This in turn has driven the necessity for higher levels of functional safety.

Human and Machine Smart Manufacturing

Methods for incorporating functional safety into SoC designs

Implementing safety requirements in a structured and consistent way has been made possible through using internationally agreed standards, such as ISO 26262 and IEC 61508. These standards have been put in place to develop a common language across the industry, as well as providing a reference for best practices. To satisfy an application’s safety requirements, the creation of safety mechanisms and rigorous development processes are needed.

One approach to mitigate random hardware faults is by using redundant hardware. For a CPU, this could be through the creation of a processor with dual core lock-step. Such hardware duplication enables rapid detection and high levels of fault detection, but increases the area and power that may not be merited for applications with lower safety integrity levels such as ASIL B or SIL 2.

A combination of safety mechanisms, such as Error Correcting Codes (ECC), Built In Self Tests (BIST) and others, can be used to help achieve these lower safety targets. For BISTs, there are various forms including memory BIST, logic BIST and software BIST (Software Test Libraries).

Logic BIST is used in production testing to provide a path to otherwise unreachable parts of the design, but can also be deployed in the field. Its use in the field can be complex to implement and is normally limited to power on testing, as its execution requires the processors to be offline to run the tests, then restarted as if a reset had occurred. This makes it a challenge to deploy at runtime where there are tight requirements for fault detection. The available performance is also reduced as there is time required to both test the processor then reset and restart the application.

Like LBIST memory, BIST is normally used as part of the production flow, but may also be feasible to be deployed in the field. Some Arm processors enable memory BIST to be executed at run-time. As this BIST approach tests the memory, it would often be implemented with other checking mechanism to address the logic of the processor.

A Software Test Library (STL) lends itself as the natural choice when used with memory protection. It also provides a more flexible approach to testing at runtime compared with many other methods.

Software Test Libraries

STLs are a group of software functions which are run on the processor to be tested and detect the presence of a fault. They test the hardware for the presence of permanent faults within the processor functional logic, such as stuck at one and stuck at zero faults.

STLs can be called by the software stack, selecting the tests to be run based on the available time for execution and without the requirement to take the core offline or reset the processor after testing. As the STL does not depend on hardware redundancy, they can be efficiently implemented with less die area and lower power than a dual core lock-step core. Power usage is equivalent to that of an application running on the CPU and there is no additional power impact like in the case of LBIST. This also enables an STL to be deployed on devices that are already available in silicon, unlike Lockstepping or LBIST which need to be planned at design time.

STLs can achieve ‘medium’ percentage detection rates of faults within automotive applications that are up to ASIL B for the Single Point Fault Metric and up to SIL 2 industrial applications for the Safe Failure Fraction. The code is generally optimized so that it utilizes a minimal memory footprint. Tests are generally written in assembly to improve deterministic execution and fault coverage when compared to compiling a high-level language like C.

Enabling functional safety on an existing design using STLs

To address a new market or applications with an existing chip, there may be a need to increase hardware metrics, such as the Single Point Fault Metric (SPFM), as part of the functional safety requirements. When targeting an established chip at applications up to ASIL B or SIL 2, an STL is a natural choice to consider. It can be deployed without the need for additional hardware changes to the device. In addition, the STL allows users to define the individual tests that need to be executed and so limits the test time and memory resources used.

Each device has a functional safety concept which includes the safety measures and safety mechanisms implemented within the processor. When looking to deploy an STL, it is important to define the areas of the safety concept which will be addressed by the test library.

STLs can be integrated into the software stack with a standard API and can be run periodically with the flexibility to define how testing is performed. The flexibility enables the licensee to either execute the complete STL with a single call or the testing can be divided into multiple discrete blocks. This either enables the depth of testing to be selected by targeting specific tests or it enables fitting the execution of tests across available time windows. This minimizes the STL’s effect on the main application’s workload and helps meet the desired Fault Tolerant Test Interval (FTTI). The operation and integration of the STL are described in the Arm STL user guide.

Autonomous Vehicles

Find out more

For existing SoC designs requiring the ability to enable functional safety, STLs provide an ideal asset. They offer key components to enable the safe execution of systems throughout a range of applications, particularly in the automotive and industrial segments. STLs also offer a flexible and efficient solution to help address the critical functional safety requirements that are only set to increase as we move to a more automated future.

Arm STLs are currently available across a range of CPUs including Cortex-A, Cortex-R and, Cortex-M products. If you have any questions about STLs or functional safety for Arm-based devices, click here to talk to an Arm expert.

Discover more information about Arm Software Test Libraries visit our Arm.com webpage.

Discover More

Embedded and Microcontrollers blog

Formally verifying a floating-point division routine with Gappa – part 2

Simon Tatham

A method of testing whether a numerical error analysis using Gappa really matches the code it is intended to describe.
- September 4, 2025
Formally verifying a floating-point division routine with Gappa – part 1

Simon Tatham

Learn the basics of using Gappa for numerical error analysis, using floating-point division in Arm machine code as a case study.
- September 4, 2025
Building Solutions on Arm: A recap of IEEE Arm Community Technothon project presentation

Fidel Makatia

Read Fidel's account from the Arm Community Technothon!
- December 4, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

The flexible approach to adding Functional Safety to a CPU

Methods for incorporating functional safety into SoC designs

Software Test Libraries

Enabling functional safety on an existing design using STLs

Find out more

Formally verifying a floating-point division routine with Gappa – part 2

Formally verifying a floating-point division routine with Gappa – part 1

Building Solutions on Arm: A recap of IEEE Arm Community Technothon project presentation