Evolving safety systems: Comparing Lock-Step, redundant execution and Split-Lock technologies

September 26, 2018

According to ISO26262, functional safety is the absence of unreasonable risk due to hazards caused by malfunctioning behaviour of electrical and electronic systems. This statement alone drives very specific requirements on to any safety-related system regardless of market vertical. The various safety standards also define different levels of safety integrity i.e. how ‘safe’ a particular system needs to be. For example, the system controlling the brakes in a vehicle would be expected to have the highest levels of safety as failure of such a system could be catastrophic. Whereas, a system controlling the motors in the driver’s seat, whilst still having a safety requirement, would be expected to have a lower rating. In ISO26262, this is defined as the “Automotive Safety Integrity Level” or “ASIL”. ASIL is currently defined as four different levels ranging from “A” (the lowest) to “D” (the highest). These levels have a direct correlation to the diagnostic coverage a system must attain or, in other words, how many faults a given system is expected to detect.

The fundamental challenge

As the automotive industry marches towards fully autonomous implementations, the expectation is that this revolution introduces a much safer world. In excess of 90% of vehicle accidents are caused by human error and this new generation of vehicle should ultimately reduce the number of fatalities by tens of thousands. However, there are still several fundamental challenges to be solved to enable pervasive deployment of such vehicles. Autonomous systems require a huge amount of compute performance and because they are capable of controlling the vehicles’ direction and speed, they require the highest levels of safety integrity.

So what are the technical options to achieve this?

1. Lock-Step

Configuring two CPU cores in ‘Lock-Step’ is a traditional way of achieving high levels of diagnostic coverage – the ability to detect the occurrence of an error condition. The principal is very straight forward. The cores each feed into a block of comparator logic and each executes exactly the same code. The comparator logic compares the outputs on a cycle-by-cycle basis and as long as the results are equal, all is well. If there are discrepancies between the results, this could be an indication of a fault condition that should be investigated or acted upon. The resulting action is defined by the system developer and is dependent upon the system in question. It could be as simple as rebooting or rechecking if the error condition still exists after given a period of time. This lock stepping is fixed in the silicon by design and therefore has no flexibility, so the application is effectively using two cores but only achieving the performance of a single core. This approach is ‘proven’ and has worked well for microcontrollers and less complex, deterministic microprocessors for many years.

Dual Core Lock Step Arm CPUs

2. Redundant execution

CPUs that offer higher performance capabilities are often a lot more complex and less deterministic and therefore much more challenging to Lock-Step. This has lead to more ‘exotic’ approaches to solve the aforementioned challenge. Software redundancy or redundant execution is certainly one alternative.

This approach assumes that two independent applications are being executed, potentially on different CPU cores or even within different virtual machines if virtualization is being implemented. As the outputs of the applications become available they are compared by an additional, high safety integrity core(s) for correctness commonly referred to as a “safety island” due to its independent clock and power supplies. This safety island would be responsible for the final “decide and actuate” phase. This approach can reduce the diagnostic coverage requirements on the high-compute cluster and can also introduce a greater degree of flexibility in to the implementation coupled with improved efficiency. However, it also dramatically increases the level of complexity of the system coupled with a lower granularity of cross-checking. Due to the benefit of software flexibility, this approach may become more widely deployed for certain applications requiring safety and high compute performance in the coming years.

Redundant Execution Arm CPUs

3. Split-Lock: The best of both worlds

The ultimate solution must be the one that brings together the benefits of both approaches – flexibility, performance, simplicity and proven. With the introduction of the ‘Split-Lock’ capability on the Cortex-A76AE, Arm has done exactly that – high compute performance coupled with high safety integrity support. How does split-lock differ from Lock-Step? In essence, it adds the flexibility that wasn’t available in lock stepped CPU implementations. It allows the system to be configured either in a ‘split mode’ (two independent CPUs that can be used for diverse tasks and applications), or ‘lock mode’ (the CPU’s are lock stepped for high safety integrity applications) at boot up. This flexibility could even be extended to support potential fail-operational modes – the ability to continue to operate in a degraded mode rather than completely shutting the system down. For example, when running in lock mode, if one core starts to exhibit a failure condition, the system could be quiesced and the faulty core be taken off-line (split) allowing continuation in a degraded mode of operation. This ‘split available’ capability is critical for any autonomous system.

The Split-Lock capability implemented within the Cortex-A76AE autonomous class processor also allows the same base design to be used across multiple applications, with or without safety, such as in-vehicle infotainment systems as well as autonomous vehicle systems enabling huge design efficiencies to be achieved throughout the supply chain.

Summary

The Cortex-A76AE is the latest addition to Arm’s new Safety Ready program and augments a rich heritage of functionally safe IP. It’s the first autonomous class processor with integrated safety – the compute performance level coupled with the split-lock functionality enable new levels of innovation and scalability in the automotive domain. This product is complemented by the industry’s broadest portfolio of safety IP also encompassing software elements, tools and comprehensive documentation.

Learn more about the Split-Lock capability of the Arm Cortex-A76AE: The first autonomous-class processor and the Arm Safety Ready program.

Arm Cortex-A76AE

Embedded and Microcontrollers blog

Adapting Kubernetes for high-performance IoT Edge deployments

Alexandre Peixoto Ferreira

In this blog post, we address heterogeneity in IoT edge deployments using Kubernetes.
- August 21, 2024
Evolving Edge Computing and Harnessing Heterogeneity

Alexandre Peixoto Ferreira

This blog post identifies heterogeneity as an opportunity to create better edge computing systems.
- August 21, 2024
Demonstrating a Hybrid Runtime for Containerized Applications in High-Performance IoT Edge

Chris Adeniyi-Jones

In this blog post, we show how a hybrid runtime and k3s can be used to deploy an application onto an edge platform that includes an embedded processor.
- August 21, 2024

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog