The innovation that lies ahead on the road to a trillion connected devices are nothing short of astonishing, as we push compute and artificial intelligence (AI) and machine learning (ML) capabilities from the cloud to the edge. To get there, engineers are rethinking how to push the boundaries of design around device size, reliability, and efficiency. In the process, they are looking to new technology nodes and optimized IP to wring the most out of their designs to ensure success at the edge and endpoints.
Nowhere is this transformation more vital than in edge computing, where most edge devices are small, ultra-low power and cost-constrained. Specialized and hyper-efficient AI compute has led to the design of Arm Ethos-N78 NPU that is highly task-specific and is designed for faster execution of AI applications. Specifically, the N78 NPU is lightning quick at performing the vector and matrix computations typical with ML workloads. Whether pairing Ethos-N78 with Cortex-A75 or the newly introduced Cortex-A78, the legacy and familiarity enables designers to reduce their time to market and improve efficiency to handle general compute functions.
The Arm Ethos-N78 enables new immersive applications with a 2.5x increase in single-core performance that scales from 1-10 TOP/s for design flexibility. The Ethos-N78 is a versatile NPU with over 100 unique configuration options available that allow customers to configure the number of MACs, SRAM, and vector engines for their unique workload. The Ethos-N78 can be integrated with Arm A-class system host CPUs and memory, such as the Cortex-A75 and Cortex-A78 CPU—designed for high-end performance at best efficiency. Ethos-N78 has been designed to reduce the DRAM bandwidth – up to 40% less DRAM data per inference – thereby improving efficiency. Reduction in DRAM usage and ~30% more area efficiency than the previous generation allow partners to achieve more in less silicon area and reduce system power.
Arm’s physical design group is optimizing the scalable Ethos-N78 on GLOBALFOUNDRIES® (GF®) 12LP+ solution to address the efficiency design challenge. GF’s most advanced FinFET solution, 12LP+ builds on the success of GF’s 14nm/12LP platform, and is mature for manufacturing. GF’s feature-rich 12LP+ solution is bolstered by power, performance, and area (PPA) optimized Arm Artisan Physical IP that is suitable for AI and ML applications.
The GF 12LP+ solution offers 20% increase in performance or 40% improvement in power compared to its 12LP platform. This technology also competes very well on performance, dynamic power, and area when compared to other industry standard nodes. The Ethos-N78 NPU, with its unmatched flexibility and advancements in performance and power-efficiency along with the 12LP+ solution, enables our partners to unleash the potential of ML on-device.
Making this specialty semiconductor offering more powerful, Arm has developed a comprehensive set of foundation IP and PPA-optimized Cortex-A75 POP implementations for GF’s 12LP+ solution, providing efficient performance of the Arm CPU. Leveraging the expertise at Arm through RTL co-optimization, the fast cache memory instances are optimized with tuned PPA for the Cortex-A75 CPU. The innovations in the process and the RTL are supported by industry-standard EDA tools and enabled by the POP IP support team to help partners achieve reliable results with an accelerated time to market.
“GLOBALFOUNDRIES and Arm have collaborated closely to enable differentiated IP solutions,” said Mark Ireland, vice president of ecosystem and design solutions at GF. “Arm has developed a specialized high-efficiency Arm Cortex-A75 POP IP solution for our GF 12LP+ solution. This enables mutual customers to recognize their full silicon potential, reduce time to market and deliver differentiated power-efficient ML and AI applications. We are also excited about the ongoing development of the Ethos-N78 NPU and its potential to provide benefits to AI enabled edge computing devices.”
Arm Artisan Physical IP offers a comprehensive platform with two standard cell architectures: the high-performance 9-track library and the 7.5-track library for high density and low-power applications. The libraries contain enhanced cell sets with more than 2500 cells that offer application-specific cells, including single-fin cells for lowest power and multi-height cells for highest performance.
AI and ML applications have enormous mathematical computational requirements, which means designers need to leverage multi-input cells like adders, multiplexers, compressors, and sequential cells. Arm libraries offer multiple drive strengths from fractional low-drive to high-drive cells, providing EDA tools a wide-range of options from which to select for optimal PPA and placement considerations.
The enhanced single-fin kit (SFK) library contains single-fin versions of cells for various functions, both combinational and sequential, without an area penalty. This serves as critical advantage for low-power designs. The low drive strength single-fin cells reduce dynamic and leakage power with an over 40% replacement rate in typical designs, saving up to 10% power.
The highest performance cells are in a new product, the Area Efficiency Booster (AEB) Kit, providing double and triple height cells. In the AEB cells, we enable the conversion of dummy fins below the power rail into active fins. Single, double, and triple height cells for critical cell drive functions offer placement flexibility, helping to decrease area. A PPA comparison between single-height only cells and single plus double-height cells shows a performance boost close to 50MHz and significant improvement in the total negative slack (TNS), which also reduces leakage. Significant area reduction and better aspect ratio compared to single height cells enable EDA tools to select these cells across design types.
There are nine memory compilers in the GF 12LP+ solution specially designed for AI applications where the need is to support the fast and power-efficient shuttling of data between processors and memory. Arm offers multiple periphery options for compiled instances to target either high-performance or low-power requirements. The progressive power gating modes aid in saving power by shutting off core or periphery power supplies. And all compilers offer multiple features and range optimizations to improve efficiency.
A key innovation for the 12LP+ memories is the specially optimized single rail 0.55V low-voltage compiler: both memory bitcell and periphery operate at the 0.55V domain making it extremely power efficient and easing implementation challenges as compared to dual rail configuration. This low-voltage compiler also provides up to 1GHz frequency at the 0.55V domain for selected instances making it an ideal choice for AI applications.
To complete the Arm offering for the GF 12LP+ solution, there are two fail-safe 1.8V and 3.3V programmable I/O libraries with auto detection of the I/O supply modes—supporting 2KV HBM and 6A CDM requirements. The Power Grid Architect (PGA) utility is included to enable rapid power grid for Artisan standard cells. PGA improves PPA by creating optimal power grids for a variety of power density designs, and includes support for the Arm standard cell offerings—including the single fin kit, area efficiency booster cells and the power management kit (PMK).
The journey to a trillion connected devices has just accelerated, thanks to the introduction of new Arm Artisan Physical IP optimized for the competitive GF 12LP+ specialty solution.
You can sign up with Arm DesignStart program to browse, investigate, and download Artisan IP for evaluation. Or simply contact us and let your innovations begin.