Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog Introducing the Arm Cortex-X Custom Program
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Machine Learning (ML)
  • DynamIQ
  • Smart Phone
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Introducing the Arm Cortex-X Custom Program

Stefan Rosinger
Stefan Rosinger
May 26, 2020
4 minute read time.

We have just announced our 2020 Arm Mobile IP, including the Arm Cortex-A78 CPU as the next step up in sustained performance on smartphones. But this year we do not stop there. We are mindful that Arm’s ever-expanding ecosystem are demanding more solutions and products based around their own specific needs and demands.

Therefore, we are delighted to announce the Cortex-X Custom (CXC) program. In close collaboration with Arm engineering teams, program partners can shape a final CPU product to meet their specific market demands. This allows program partners to define their own performance points outside of the usual Cortex-A design envelope of performance, power, and area (PPA). This final custom CPU, designed and built by Arm, will then be delivered under the Arm Cortex-X brand. The very first CPU as part of the CXC program is the Arm Cortex-X1 CPU.

Cortex-X1: the most powerful Cortex CPU Cortex-X1: the most powerful Cortex CPU

Introducing the Arm Cortex-X1 CPU

Cortex-X1 is the most powerful Cortex CPU to date, bringing 30 percent peak performance improvements in the next generation over the current Arm Cortex-A77 CPU. It is designed to bring ultimate performance for next-generation custom solutions. This is in response to partners who wanted to maximize performance in line with their own specific use-cases.

Cortex-X1 also provides performance uplifts when compared to the Cortex-A78, offering 22 percent integer (single-thread) performance improvements¹. This short high-performance burst is best for reactivity and responsiveness when using devices, enabling the highest performance ever for smartphones and large screen devices. Furthermore, Cortex-X1 offers 2x machine learning (ML) performance improvements over Cortex-A77¹. The big improvement has been made despite the previous generation bringing a significant step-up for on-device intelligence. This is part of our wider push for more local compute performance.

Cortex-X1: designed for ultimate performance Cortex-X1: designed for ultimate performance

As described in the Cortex-A78 blog, the DynamIQ cluster of 4x Cortex-A78 and 4x Cortex-A55 provides 20 percent sustained performance improvements over the 4x Cortex-A77 and 4x Cortex-A55 cluster². However, introducing Cortex-X1 enables even greater scalability through bringing a boost in peak performance. Adding 1x Cortex-X1 as part of the DynamIQ cluster alongside 3x Cortex-A78 and 4x Cortex-A55, the peak performance is 30 percent over the previous generation². When combined with the premium efficiency of Cortex-A78, it delivers the best sustained and peak performance. Therefore, it perfectly fits the ever-expanding need of performance for mobile devices.

The Cortex-A78 and Cortex-X1 DynamIQ clusters The Cortex-A78 and Cortex-X1 DynamIQ clusters compared to the previous generation

Advanced and faster digital immersion on smartphones

The key market for solutions with Cortex-X1 are smartphones and new form factors. The performance uplift supports the move towards new foldable designs and bigger, multiple screens. Cortex-X1 provides quicker, more seamless user experiences, with faster app loading times and improved webpage scrolling responsiveness. The big ML uplift enables more advanced AI and ML-based experiences.

Similar to Cortex-A78, Cortex-X1 enables improvements to multiple digital immersion use-cases and experiences on mobile. These range from common productivity, communication, security, and camera-based use-cases right through to advanced gaming and XR (augmented reality and virtual reality) experiences.

The Cortex-X1 microarchitecture upgrades The Cortex-X1 microarchitecture upgrades for maximum performance

How we maximized performance through microarchitecture

As you can see from the image above, Cortex-X1 has various microarchitecture upgrades that enable ultimate performance. Compared to Cortex-A78, the decode bandwidth has been increased by 25 percent to 5 instructions decoded per cycle. Moreover, the MOP cache throughput has been increased by 33 percent to 8 MOPs per cycle. On Cortex-X1, the Neon engine gets two additional pipes, doubling its compute capacity over Cortex-A78. Finally, on cache sizes, Cortex-X1 supports 64kB L1 and up to 1MB L2 cache. The DynamIQ cluster has also been upgraded to now support 8MB of L3 for ultimate performance. This larger L3 can also be used by Cortex-A78 when used in conjunction with Cortex-X1.

Maximum performance and differentiation for partners

Cortex-X1 is the very first example of a Cortex CPU that the CXC program can produce. It extends the digital immersion capabilities of smartphones through new levels of performance, making Cortex-X1 Arm’s most powerful CPU to date.

As part of the CXC program, subscribed partners collaborate with Arm to define custom CPUs that push performance at an envelope outside of the Cortex-A PPA. As a result, partners will have a CPU that is specific to their market needs and shows differentiation beyond roadmap Cortex-A CPUs. Through the CXC program, we are meeting the needs of the ever-expanding ecosystem, taking the best of Arm and applying it to the next level.

Learn more about the Cortex-X Custom program
Visit the Arm newsroom blog

¹ Comparing Arm single core peak performance at 3.0GHz. Cortex-X1: 1MB priv-L2, 8MB L3 cache vs Cortex-A78 (32kB) / Cortex-A77 512KB priv-L2, 4MB L3 cache. Machine learning performance based on Matrix multiplication theoretical throughput. Measured estimates on SPECint*_base2006 (SPECspeed* Integer component of SPEC CPU* 2006) Arm single-core performance estimated for mobile platform. Results are measured estimates using specific computer systems, software, components, operations, and functions and changes to any of these factors will cause the results to vary.  

² Comparing Arm single core performance at 1 watt on Cortex-A78 and Cortex-77, comparing Arm single core peak performance on Cortex-X1 to Cortex-A78 and comparing cluster area on Cortex-X1/Cortex-A78/Cortex-55 1+3+4 topology and Cortex-A78/Cortex-A55 4+4 topology to Cortex-A77/Cortex-A55 4+4 topology, including architectural and process improvements (compared to 2019 devices).

Anonymous

Top Comments

  • Stefan Rosinger
    Stefan Rosinger over 5 years ago in reply to Nile.EdenAgs +1
    Frequency entitlement of the Cortex-X1 is similar to Cortex-A78, measured around 3GHz on 5nm process nodes. As for instruction throughput, Cortex-A78 is able to process 4 instructions / 6 macro-ops, and...
  • JasonM
    JasonM over 5 years ago in reply to Stefan Rosinger +1
    Any approximations on the Cortex-X1 speed with a non-EUV machine thickness?
  • Stefan Rosinger
    Stefan Rosinger over 5 years ago in reply to JasonM +1
    Frequency entitlement of the Cortex-X1 is not limited to EUV process, and has been measured at 3GHz on non-EUV process (7nm) as well.
  • Carlos Delfino
    Carlos Delfino over 5 years ago

    Does Cortex-X1 act as a coprocessor? and applies only to cell phones?

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Stefan Rosinger
    Stefan Rosinger over 5 years ago in reply to JasonM

    Frequency entitlement of the Cortex-X1 is not limited to EUV process, and has been measured at 3GHz on non-EUV process (7nm) as well.

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • JasonM
    JasonM over 5 years ago in reply to Stefan Rosinger

    Any approximations on the Cortex-X1 speed with a non-EUV machine thickness?

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Stefan Rosinger
    Stefan Rosinger over 5 years ago in reply to Nile.EdenAgs

    Frequency entitlement of the Cortex-X1 is similar to Cortex-A78, measured around 3GHz on 5nm process nodes. As for instruction throughput, Cortex-A78 is able to process 4 instructions / 6 macro-ops, and Cortex-X1 5 instructions / 8 macro-ops for maximum performance. 

    • Cancel
    • Up +1 Down
    • Reply
    • More
    • Cancel
  • Nile.EdenAgs
    Nile.EdenAgs over 5 years ago

    Curious about frequency, and instructions or operations per clock.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
>
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025