It was a little less than 2 years ago that we announced the Cortex-A76AE processor as Arm’s first dedicated high-end CPU design for safety applications. In the time since, both the industry and Arm have come a long way. Autonomous class technologies are making rapid inroads into the automotive sector while demanding that security and functional safety requirements continue to be met. Equally, the application space has enlarged exponentially introducing pitstops on the way to the promised land of fully autonomous machines. Self valet-parking, on-ramp-to-off-ramp auto driving are realities (or close to being so) but of particular significance is the application of the underlying technologies in segments such as industrial warehousing and autonomous manufacturing. Clearly the twin demands of high performance compute and demonstrable safety are of great interest to a variety of market segments.
Our newest member of the AE family of CPUs, the Cortex-A78AE, comes just in time to service our partners’ ever-growing need for safe compute. Functional safety is evolving into an era of mixed-safety criticality underwritten by the move to domain controllers in automotive E/E architectures. The co-location of multiple application threads on a common software entity poses some interesting challenges in terms of thread management, responsiveness, and switching times between applications. On the industrial side, the deployment of common-off-the-shelf infrastructure for IT and the connecting of the OT (operational technologies) domain to the network raises concerns of security and guaranteed cycle times. A common thread running through all these themes is a desire for ever-increasing single thread compute performance. Equally, both the automotive and industrial segments are increasingly running into the thermal wall where deployable solutions are limited by the power dissipation limits of the system. Lastly, the dynamics of the industry dictate that the logic of reuse is maximized wherever feasible, particularly given the costs of chip design in the newer process geometries. This is especially true for partners who service multiple market segments. The bottom line is that the industry needs an uplift in safe, secure, single-thread compute performance that comes with improved power efficiency levels.
The Cortex-A78AE is Arm’s definitive answer for the automotive and industrial sector’s compute needs for the next generation. The micro-architecture is revamped on a number of fronts - additional fetch bandwidth, improved branch prediction, lower mis-predict penalty, wider integer issue and memory subsystem with 50% higher bandwidth than the previous generation. Of particular significance is the introduction of the Macro-operation Cache, a structure designed to hold decoded instructions that decouple the fetch engines from the execution thereby enabling dynamic code sequence optimization. Together, these innovations result in over 30% performance improvement on the Spec2006 synthetic benchmark suite – across both integer and floating-point routines. The Cortex-A78AE achieves this impressive generational boost while actually improving performance per Watt. As a matter of fact, it manages to achieve the Cortex-A76AE’s targeted performance at 60% lower power up a 7nm implementation. At the same power envelope, the Cortex-A78AE offers a 25% performance boost, trading the power for performance, our partners should find this new CPU is able to handle their workloads.
For the demanding workloads that we are seeing in robotics and autonomous driving, multi-threaded performance is just as critical. The Cortex-A78AE can be scaled, just like its predecessor, in CPU clusters up to a maximum of 4 cores. Multiple clusters can be grouped together with the capable CMN-600AE to offer a many-core implementation. For those truly performance-oriented applications, a multi-chip or chiplet extension is an option using Arm’s CCIX chip-to-chip extensions.
And as both cars and the factory floor become increasingly connected to the rest of the infrastructure, cybersecurity concerns are taking center stage. Arm has a robust pipeline of features for partners to build their security solutions on top of, and the first of these make its appearance in the Cortex-A78AE – Pointer Authentication (PAC). Targeted at shoring up vulnerabilities in Return-Oriented-Programming, statistically the most common form of software exploits, PAC, and its enhanced cousin PAC2, provide the IP with a cryptographic check of stack addresses before they are put on the Program Counter.
When Cortex-A76AE introduced the Split-Lock architecture, it was widely seen as the birth of a new era of safe compute. The timely detection of faults in the logic goes a long way in addressing functional safety concerns as dictated by industry standards such as ISO 26262/IEC 61508. But new architectures raise new challenges – availability, ASIL B support and system-wide functional safety. Cortex-A78AE addresses these challenges head-on with a range of safety features. First, Arm enhances the original lock-step capability with the addition of temporal diversity to guard against common cause failures, a small but rather vital addition. In addition to Split mode operation, we bring the enhancement of Hybrid mode – an advancement to allow the shared DSU-AE logic to continue operating in Lock-mode while the CPUs remain independent (Split). The gain is two-fold: (1) The additional coverage of the DSU-AE counts towards Diagnostic coverage in our FMEDA (2) The CPUs can be individually taken offline for testing while the cluster itself remains available for compute, albeit at a reduced computational capacity. This addresses the pressing concern of availability that impacts our automotive and industrial partners who cannot afford down-time in mission-critical applications like industrial warehousing robots. Standard safety measures such as cache protection logic continue to be mandatory in the Cortex-A78AE, availability is further enhanced with the addition of line lockout support to avoid hitting bad locations in the cache structures. Finally, Cortex-A78AE comes with AMBA parity protection features which are architected to work alongside our suite of AE IP portfolio. This is an easy and validated way of extending the functional safety umbrella across the rest of the SoC thereby achieving the goal of End-to-End (E2E) protection capability.
In addition to the safety features included in the CPU, physical IP for autonomous applications must achieve a higher bar in terms of reliability and testability. Arm Artisan Physical IP includes Safety Ready products optimized for industrial and automotive markets and is built on certified manufacturing processes for low-risk adoption.
Impressive as the performance of the Cortex-A78AE is, the compute platforms in automotive and industrial demand a complex blend of power efficiency, algorithmic intensity and straight compute throughput. Right-sized compute is the mantra of the day. Put simply, no one micro-architecture satisfies the application needs of these market segments. As an example, an autonomous drive platform needs to sense data, perceive obstacles and decide on the right path vector before engaging the vehicular controls. Just the middle two tasks, require an enormous variety of algorithmic execution. To this end, the CPU supports the ability to be configured in a variety of cache sizes – across L1, L2, and L3 – besides memory interfaces and types. The Cortex-A78AE can be paired in heterogenous compute clusters alongside the Cortex-A65AE and can be coupled with accelerators over the Accelerator Coherence Port. A low latency peripheral port is of use for dedicated system interface controllers, while the CMN-600AE and MMU-600AE IPs support CHI-protocol-based NPUs and general-purpose GPU blocks within the coherence domain of the CPU cluster. These products provide the system designer with the ability to right-size the platform to the task at hand.
The extended dynamic range of the Cortex-A78AE coupled with the Split-Lock capability provided another dimension of heterogeneity, particularly to customers seeking to redeploy architectures across market segments. An Automotive partner can reuse the same underlying compute architecture across IVI and autonomous drive systems by varying configurations, operating/implementation points and of course, the safety mode of operation to get drastically varying PPA and performance points. This capability is one of the strong points of the AE portfolio for our demanding multi-market partners.
It is no secret that the future generation of hardware platforms enabled by Cortex-A78AE will be defined by software. The path to autonomy relies on millions of lines of code yet to be written and validated. For this reason, Arm and its ecosystem partners have been creating products and technologies aimed at significantly accelerating software programs across the supply chain. One great example is Arm Fast Models, used to build functionally accurate virtual platforms that enable large scale software development and cloud-based validation well ahead of hardware availability. Combined with Arm Development Studio, which includes the Arm Compiler for Safety qualified by TÜV SÜD for use at the highest safety integrity levels, users can benefit from a development environment ready to explore all capabilities of the Cortex-A78AE right now.
We are well on the way to a future of autonomous machines based on the Arm architecture. The Cortex-A78AE is an important milestone on this journey, one that will rightfully be seen as an enabler of many key technologies that are still required to make this journey a reality. The ecosystem is already excited with the possibilities that this key IP enables and awaits partner platforms based on it. Join us in this exciting journey and explore our product page for more details on the Cortex-A78AE CPU.
Learn more about Cortex-A78AE