It is entirely appropriate that ARM will announce technical details of its latest hard macro product, the Cortex™-A15 MP4 Hard Macro for TSMC 28HPM node at COOL Chips XV, the IEEE Symposium on Low-Power and High-Speed Chips, being held this week in Yokohama, Japan (18-20th April, 2012). This exciting new hard macro not only perfectly encapsulates the theme of the symposium, but also pulls together the contemporary and divergent design challenges of offering extremely high-performance compute engines within a conservative power budget.The Cortex-A15 MP4 Hard Macro is a high performance, power-optimized quad-core hard macro implementation of our flagship Cortex-A15 processor, on leading 28nm process. It delivers three significant firsts for the ARM hard macro portfolio, as not only is this the first quad""core hard macro, but also the first hard macro based on the highest performance ARMv7 architecture-based Cortex-A15 processor, and it is also the first hard macro based on 28nm process. In terms of configuration, the Cortex-A15 MP4 Hard Macro includes:
The hard macro has been developed using ARM Artisan® 12-track libraries and Processor Optimization Pack™ (POP) solutions for the Cortex-A15 processor on TSMC 28nm HPM process.
I outlined in my earlier blog the three main challenges in modern SoC design, namely those arising from the rapid evolution of processor technology, the jumps in process implementation technology, and the ever present commercial challenges which have sharpened due to the recent global economic climate. I go on to demonstrate why ARM hard macros are a very exciting and credible solution for silicon vendors.
I will resist the urge to repeat the message here, but it is worthwhile noting that with every jump in complexity for the processor and the process node, there is a significant rise in the challenges, costs and risks associated with getting the SoC implementation just right and in time. Today, the SoC development challenge is perhaps highest when designing with the latest high-performance multicore processors such as the Cortex-A15 processor on leading geometries.
One of the biggest challenges in designing high-performance systems on the latest nodes is keeping the power profile and leakage levels really low. And it is here that the Cortex-A15 hard macro really excels, delivering a blistering performance of more than 2GHz and in excess of 20,000DMIPS, while maintaining the power efficiency of the Cortex-A9 hard macro. This makes this latest macro offering from ARM a real and timely boon to SoC designers venturing into what are for many, uncharted territories.
In order to achieve this low leakage high-performance implementation, some of the best brains at ARM pitted their expertise against a series of design challenges and decision points, across all stages of the implementation flow.
Consider the challenge of picking the right base library combinations from the various foundry process offerings on 28nm, several Vt options, channel length variations and literally thousands of cell choices. Picking the best multilateral combinations that would deliver the desired Performance, Power and Area (PPA) targets was a crucial first step on the way to success.
Then there was the challenge of managing the diverging needs of silicon vendors who wish to use the full entitlement of process geometry to build highly complex SoC, while the product developers focus on providing consumers with best in class battery life. It was clear that the Cortex-A15 hard macro would need some sophisticated power management schemes to ensure both needs were met adequately. The power grid for the macro was designed to support typical frequency at worst case process and operating conditions. The Cortex-A15 hard macro supports multiple power domains, and also supports DVFS across the two VSOC and VCORE voltage domains.
An interesting timing closure challenge for the design team was to overcome the limitations of the traditional fixed OCV (On-Chip Variations) and fixed margins, which are now running out of steam. For example, a 15ps increase in the margin can add 200% more hold buffers. The Cortex-A15 hard macro uses Advanced OCV (AOCV) techniques which provide more flexible margins but the lack of full EDA support for AOCV made things interesting.
I would love to go on further about the creativity of the design but I'm aware that my editors asked for a blog, not a whitepaper.
It is fair to conclude that the unmatched power efficiency in this high-performance Cortex-A15 MP4 implementation was achieved by capitalizing on the vast implementation expertise available in ARM, and by leveraging the tight synergy that exists between ARM CPU, Physical and Fabric IP teams.