First launched in 2021, Arm’s Total Compute Solutions deliver a complete package of IP designed and optimized to work together seamlessly. This makes it easier for System on Chip (SoC) designers to tackle the many challenges of building and configuring their own compute subsystems. These include developing third-party system IP for Interconnect, System Level Caches (SLCs) and Memory Management Units (MMUs), and then integrating everything with the CPU and GPU clusters. Arm’s Total Compute Solutions significantly reduce the complexity of SoC designs, reducing engineering cost and resource and accelerating the time-to-market. This allows device manufacturers to focus on delivering their own true commercial value, which is hardware and software differentiation.
As with previous generations, the new Arm Total Compute Solutions (TCS23) address these core SoC engineering challenges alongside wider mobile computing trends. These include demands for more complex user experiences, new software capabilities and the continuous push for more performance and efficiency. The engineering challenges are particularly relevant for the premium mobile market where building SoCs is becoming more complex for silicon vendors. Through TCS23, which is built on the foundation of the new Armv9.2 architecture, we are enabling partners to utilize the latest techniques required to push power efficiency and performance boundaries so they can build the very best premium mobile SoCs. Partners can also adopt TCS23 to create a variety of configurations and scalable computing solutions to bring the TCS23 capabilities to a broad range of consumer market segments.
TCS23 integrates the latest Arm IP products across CPU, GPU, and System IP to deliver a wide range of computing capabilities and use cases for next-generation mobile devices. These include:
All the new IP delivers system level optimizations for scalability and efficiency improvements across the TCS23 platform.
Alongside the latest IP, TCS23 provides development tools, designs, and optimizations tailored for the latest Android Operating System and physical implementation support to accelerate SoC designs.
We continue to develop our library software, such as Arm NN and Arm Compute Library, so developers can optimize the execution of their machine learning (ML) workloads on the Armv9 architecture. Since the beginning of this year, Arm NN and Arm Compute Library are being used by Google Apps on Android and already have 100 million active users. We are also working to seamlessly enable our IP and new features in the upstream Android Kernel.
Through TCS23, we also provide a wide range of free tools and resources, so developers can optimize their applications on Arm-based mobile devices. With nearly 9 million mobile developers worldwide, we pride ourselves on offering the flexibility and commonality needed to write easier, simpler, more secure, and faster software on Arm, for Arm. Focusing on gaming, we have deep partnerships with leading game engines to ensure our graphics tools provide highly scalable gaming optimizations. Meanwhile, our detailed resources help developers create their own gaming content.
Finally, the optimized physical IP achieves leading implementations of the Arm IP on the latest, most advanced nodes.
Broadly, there are three different types of TCS23 configurations – premium, performance, and efficiency – for different devices, use cases and compute requirements.
The premium TCS23 is designed for ultimate performance and compute-intensive experiences that are commonly required for premium and flagship smartphones and laptops. It pushes system-wide performance and efficiency improvements for the very best visual experiences, such as immersive, smooth AAA mobile gaming experiences, advanced AI use cases like image and video enhancement, and device multi-tasking. The premium TCS23 balances this performance with high levels of power efficiency for multiple days of use.
The performance TCS23 is designed to address a wide range of compute requirements across multiple consumer device segments, including premium DTVs and set-top boxes (STBs) and mid-tier smartphones. The aim is to deliver high graphics and compute performance with maximum scalability for outstanding user experiences. The powerful graphics and compute performance is key to enabling multi-tasking on these devices, as well as a super smooth UX, especially when launching and switching between applications. For example, for DTVs, this could mean multi-view capabilities, such as video calling while having video streaming and AI applications overlaid on the screen. The increased performance also provides advanced ML capabilities that enhance the user experience for camera and video use cases.
The efficiency TCS23 covers ultra scalable solutions for the very best power, cost, and area efficiency. It is targeted for devices where these efficiency considerations are vital, such as entry-level DTVs and set-top boxes (STBs) and wearables, like smartwatches. The enhanced power efficiency of our IP, as well as on a system level, enables our partners to design next-generation products with outstanding battery life. In addition, TCS23 offers a wide range of configuration options to address these cost sensitive markets. For example, we have a scalable cluster of LITTLE CPU cores powered by the new Cortex-A520 and scalable Mali GPUs.
For every generation of Arm Total Compute Solutions, we build a complete compute subsystem on a FPGA platform. The aim is to go beyond the performance of individual standalone IP products and analyze the complete solution level performance when running complex compute workloads and a full operating system, such as Android 13.
For TCS23, the reference platform was a premium solution comprising of Cortex-X4, Cortex-A720, and Cortex-A520 LITTLE CPU cores alongside our new DSU-120 with 8MB of L3 cache. The CPU cluster is partnered with Arm’s second-generation Immortalis-G720 GPU, with the CoreLink CI-700 providing the Interconnect and SLC, which is available to all IP. It is worth noting that this is just an example configuration for benchmarking purposes, with our partners able to choose alternative TCS23 configurations based on their own requirements. However, as we show below, the platform provides impressive results.
TCS23 is optimized for improved latency and bandwidth reduction for real-world workloads. The platform delivers a 30 percent DRAM bandwidth per frame traffic reduction on average compared to the previous generation TCS22¹. However, for some content, particularly games, this is even higher. For example, when we analyze scenes from the popular AAA game Fortnite, there is up to 44 percent reduction in DRAM bandwidth at a system level. Less bandwidth means less power is needed in the system, providing power efficiency savings of 20 percent on average for the GPU and DRAM power contributions². The DRAM bandwidth reduction is largely due to the new Immortalis-G720. The GPU introduces a new feature called deferred vertex shading (DVS) that is part of the brand-new 5th Gen GPU architecture, a variety of efficiency improvements, and optimizations to the SLC allocation policies.
We measured the TCS23 platform across several compute and graphics performance benchmarks. For general compute, we saw 27 percent peak performance improvements when moving to a 1+5+2 TCS23 CPU configuration compared to a 1+3+4 TCS22 CPU configuration³. Focusing on the browsing experience alone, there is a 33 percent performance uplift with the TCS23 hardware with the same cluster configuration as we used for the previous generation TCS22⁴, and 64 percent performance uplift when combining TCS23 hardware with optimized software⁵. Meanwhile, there is up to 21 percent performance improvements on the Manhattan 3.0 graphics benchmark⁶.
For TCS23, we have optimized both the hardware and software to run ML workloads faster. Combining the new CPUs with hardware and software improvements for the TCS23 platform, we see an average increase in ML performance of 12 percent for Cortex-X4, 9 percent for Cortex-A720 and 13 percent for Cortex-A520⁷. On the GPU, we followed up last year’s hardware improvements with further software optimizations to Arm NN and Arm Compute Library. These provide a 4x ML performance boost on a super resolution FSRCNN network⁸.
Through TCS23, Arm remains committed to evolving platform security through new advanced technologies and techniques to increase security assurance. TCS23 is designed to support the Android Virtualization Framework (AVF), which was introduced with Android 13, as one of its key security features. AVF, which is only supported on ARM64-based devices, provides secure and private execution environments for executing code. This is ideal for advanced use cases that require stronger security and privacy assurance to user data.
For Pointer Authentication (PAC) and Branch Target Identification (BTI), which work together to improve control flow integrity by eliminating almost all ROP and JOP attacks, we managed to reduce the performance cost associated with both security features, so it is negligible for the new Cortex-X4 and Cortex-A720 CPU cores. Moreover, through PAC enhancements, including the new QARMA3 algorithm, the performance impact of PAC and BTI is now reduced to less than one percent for Cortex-A520 CPU cores.
Finally, we have updated Trusted Firmware-A (TF-A) with a new mbedTLS v3.3 library that provides new features and bug fixes for enhanced data protection.
Our partners can leverage the power of TCS23 across all tiers of mobile devices, enabling them to create life-changing products, services, and experiences. Regardless of the TCS23 configuration that partners choose, they benefit from a reduced time-to-market and lower costs during SoC development. Every TCS23 configuration – whether it is premium, performance or efficiency – features IP with the same hardware interfaces and software enablement designed to work better together.
The end-to-end system optimization of TCS23 unlocks the best overall SoC performance and efficiency for mobile computing use cases now and in the future. TCS23 also provides more security and software features to ensure developers can access and unlock their creative potential and deliver innovative, immersive experiences. The multiple system level improvements and additional new features make TCS23 the complete platform for the future of mobile computing.
1. Power consumption for TCS23 GPU and DRAM+PHY vs TCS22 GPU and DRAM+PHY, measured on Arm FPGA platforms.2. Power consumption for TCS23 GPU and DRAM+PHY vs TCS22 GPU and DRAM+PHY, measured on Arm FPGA platforms.3. Based on ‘Geekbench 6 MT’ benchmark for General Compute Performance. Measured on FPGA at system level, Android 13 iso-frequency, iso L3/SLC cache size.4. Based on ‘Speedometer 2.1’ benchmark for Browsing Experience. Measured on FPGA at system level, Android 13 with 1+3+4 cluster config and iso-frequency.5. Based on ‘Speedometer 2.1’ benchmark for Browsing Experience. Measured on FPGA at system level, Android 13 with 1+3+4 cluster config and iso-frequency. Using publicly available Optimized Chromium r114 with PAC/BTI enabled. Comparison against r99 baseline.6. Measured on TCS23 at system level, Android 13 iso-process, iso-core count, iso-voltage vs TCS22 Arm reference system, TCS22 using r35p0 and TCS23 using r40p0 DDK.7. Average performance uplift (inference time) across a range of ML workloads comparing Arm Compute Library v22.05 with v23.02 and vs TCS22 Arm reference system, comparisons with TCS22 generation equivalent cores iso-frequency.8. Average performance uplift (inference time) across a range of ML workloads comparing Arm Compute Library v22.05 with v23.02 and vs TCS22 Arm reference system, comparisons with TCS22 generation equivalent cores iso-frequency.