Squaring the circle - Optimizing power efficiency in a Cortex-A15 processor

September 11, 2013

4 minute read time.

It is entirely appropriate that ARM will announce technical details of its latest hard macro product, the Cortex™-A15 MP4 Hard Macro for TSMC 28HPM node at COOL Chips XV, the IEEE Symposium on Low-Power and High-Speed Chips, being held this week in Yokohama, Japan (18-20^th April, 2012). This exciting new hard macro not only perfectly encapsulates the theme of the symposium, but also pulls together the contemporary and divergent design challenges of offering extremely high-performance compute engines within a conservative power budget.

The Cortex-A15 MP4 Hard Macro is a high performance, power-optimized quad-core hard macro implementation of our flagship Cortex-A15 processor, on leading 28nm process. It delivers three significant firsts for the ARM hard macro portfolio, as not only is this the first quad""core hard macro, but also the first hard macro based on the highest performance ARMv7 architecture-based Cortex-A15 processor, and it is also the first hard macro based on 28nm process.

In terms of configuration, the Cortex-A15 MP4 Hard Macro includes:

NEON™ and Floating Point Unit (FPU) technology
ECC for L1 and L2 RAMs (L1-I cache has single bit parity)
2x32KB L1 and 2MB L2 caches
224 interrupts, 6 power domains
AMBA® Protocol Domain Bridge, CoreSight™, AMBA APB™, ATB, Funnel

The hard macro has been developed using ARM Artisan® 12-track libraries and Processor Optimization Pack™ (POP) solutions for the Cortex-A15 processor on TSMC 28nm HPM process.

I outlined in my earlier blog the three main challenges in modern SoC design, namely those arising from the rapid evolution of processor technology, the jumps in process implementation technology, and the ever present commercial challenges which have sharpened due to the recent global economic climate. I go on to demonstrate why ARM hard macros are a very exciting and credible solution for silicon vendors.

I will resist the urge to repeat the message here, but it is worthwhile noting that with every jump in complexity for the processor and the process node, there is a significant rise in the challenges, costs and risks associated with getting the SoC implementation just right and in time. Today, the SoC development challenge is perhaps highest when designing with the latest high-performance multicore processors such as the Cortex-A15 processor on leading geometries.

One of the biggest challenges in designing high-performance systems on the latest nodes is keeping the power profile and leakage levels really low. And it is here that the Cortex-A15 hard macro really excels, delivering a blistering performance of more than 2GHz and in excess of 20,000DMIPS, while maintaining the power efficiency of the Cortex-A9 hard macro. This makes this latest macro offering from ARM a real and timely boon to SoC designers venturing into what are for many, uncharted territories.

In order to achieve this low leakage high-performance implementation, some of the best brains at ARM pitted their expertise against a series of design challenges and decision points, across all stages of the implementation flow.

Consider the challenge of picking the right base library combinations from the various foundry process offerings on 28nm, several Vt options, channel length variations and literally thousands of cell choices. Picking the best multilateral combinations that would deliver the desired Performance, Power and Area (PPA) targets was a crucial first step on the way to success.

Then there was the challenge of managing the diverging needs of silicon vendors who wish to use the full entitlement of process geometry to build highly complex SoC, while the product developers focus on providing consumers with best in class battery life. It was clear that the Cortex-A15 hard macro would need some sophisticated power management schemes to ensure both needs were met adequately. The power grid for the macro was designed to support typical frequency at worst case process and operating conditions. The Cortex-A15 hard macro supports multiple power domains, and also supports DVFS across the two VSOC and VCORE voltage domains.

An interesting timing closure challenge for the design team was to overcome the limitations of the traditional fixed OCV (On-Chip Variations) and fixed margins, which are now running out of steam. For example, a 15ps increase in the margin can add 200% more hold buffers. The Cortex-A15 hard macro uses Advanced OCV (AOCV) techniques which provide more flexible margins but the lack of full EDA support for AOCV made things interesting.

I would love to go on further about the creativity of the design but I'm aware that my editors asked for a blog, not a whitepaper.

It is fair to conclude that the unmatched power efficiency in this high-performance Cortex-A15 MP4 implementation was achieved by capitalizing on the vast implementation expertise available in ARM, and by leveraging the tight synergy that exists between ARM CPU, Physical and Fabric IP teams.

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Squaring the circle - Optimizing power efficiency in a Cortex-A15 processor

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Arm A-Profile Architecture developments 2025

When a barrier does not block: The pitfalls of partial order