big.LITTLE in 64-bit

September 11, 2013

5 minute read time.

With the ARM Cortex-A50 series processors, ARM has introduced a "big" and "LITTLE" processor pair that is 64-bit capable. So with this 2nd generation of big.LITTLE platform, what does this mean for big.LITTLE software, which is currently being readied for deployment on ARMv7 32-bit processors? How will big.LITTLE processing technology be used in applications outside mobile like low-power servers, where 64-bit processing is a growing requirement?

Preparing for 64b Operating Systems

To start with, I should highlight that big.LITTLE software operates at the level of the operating system, in kernel space. To be clear, this means it is completely transparent to all apps and middleware. In both the major modes of operation (CPU migration and big.LITTLE MP) (discussed in more detail elsewhere) the software consists of a relatively small patch set to the OS kernel. Today, these patches are written in ARMv7 code, available in the open source or from Linaro. The Cortex-A50 series processors support the AArch32 execution state which is 100% backward compatible with ARMv7, so a Cortex-A50 series big.LITTLE processor can run existing 32-bit kernels without any major changes, including kernels that have been patched to support big.LITTLE. There will be some changes in cache maintenance routines, but effectively the big.LITTLE software is the same.

This is important as we are continuously improving the ARMv7 big.LITTLE code base. The first generation of devices based on big.LITTLE processors expected in the market in 2013.

ARMv8 allows 64-bit and 32-bit operation. AArch64 is the architecture that describes 64-bit mode of operation and AArch32 describes the 32-bit mode of operation. AArch64 also delivers other architectural benefits like enhanced SIMD, larger register files, enhanced cache management, tagged pointers, and more flexible addressing modes. For a big.LITTLE processor to deliver the architectural benefits of AArch64, it must run a 64-bit OS built on AArch64.

ARM 64-bit Linux has already been up-streamed, and ARM has demonstrated Android 32-bit code running (unmodified) on top of the 64-bit Linux kernel. The next step in providing big.LITTLE support in the 64-bit kernel is to modify the big.LITTLE MP and CPU migration patch sets to work cleanly in the AArch64 environment. Fortunately the code is not strongly impacted by register width, and therefore the vast majority should port cleanly and with little effort from ARMv7 to 64 bit; we plan to do this work at ARM and release 64-bit capable patch sets in mid-2013. This lines up well with expected Cortex-A50 based SoCs sampling at the end of 2013 and deployed in products in 2014.

Although we don't expect 64-bit mobile OS's to become prevalent that early, the AArch32 mode of the Cortex-A50 series processors will handle the ARMv7 32b OS, and will be ready for the transition to 64-bit when it does occur. big.LITTLE in the Enterprise? Originally conceived as an energy savings technique for mobile phones, big.LITTLE can be viewed as an interesting disruptive technology for applications like ARM processor based low-power servers. For servers and networking applications which are generally memory bound, having a large number of efficient processors that are tuned to workload makes a lot of sense. Often this workload leads itself to having multiple cores at different performance levels, but which are software identical.

As performance scales to higher core counts and the system power budgets reduce, the amount of power budget left for the CPU even in enterprise is very similar to that of mobile. Consider a fanless 20-25 W chip that has 16 CPUs, IO devices, a large L3 cache and other accelerators on board. Once you strip out the budgets for the non-CPU portions and split the remaining amongst the 16 CPUs, they budget is very much similar to a mobile phone power budget. big.LITTLE allows system designers to have their cake and eat it by delivering enterprise performance using a mobile pedigree processors and resultant low-cost, fanless device.

The other aspect of big.LITTLE technology that is attractive is the ability to more efficiently support a dynamically varying level of required performance. Infrastructure equipment is typically designed for the peak operating capacity, for example, to support the call volume on Mother's Day or the mobile internet traffic during the Super Bowl. On most days the traffic is at most half of the peak traffic. An architecture that includes a mix of big and LITTLE cores in the same system, or even on the same die, can be dynamically adapted to the performance needs of the network more efficiently. This leads to better overall power consumption and reducing TCO.

big.LITTLE MP software, which gives the OS full view of all the big and LITTLE processors in the system, can automatically handle the work allocation in such a system. This mode of scheduling is more appropriate to the enterprise use case than CPU migration. CPU migration leverages dynamic voltage and frequency scaling (DVFS) to trigger the move between big and LITTLE cores. This works well in mobile devices which typically employ DVFS, but is not as suitable for enterprise systems which typically do not. Now that big.LITTLE MP has been effectively demonstrated on real silicon, enterprise partners are evaluating how big.LITTLE can help them achieve their performance goals without blowing the power budget.

In servers, the benefits of big.LITTLE are still under investigation. There is tremendous interest in ARM based low-power servers, where even our "big" Cortex-A57 CPU will consume significantly lower power than incumbent solutions. With increasing pressure on OEMs to create power efficient servers, it is clear that high peak performance CPUs do not always equate to the best solution. One CPU size does not fit all. For many classes of server solutions, aggregate throughput is more important than peak performance. In these applications, a many core approach with lots of LITTLE Cortex-A53 processors delivers the highest level of aggregate performance under a reduced power budget. It is likely that a range of power efficient server products will be built around Cortex-A57 or Cortex-A53, but probably not with both on the same chip. The OS software will be ready to cope with either case, big.LITTLE or homogenous multi-core, as the market evolves.

0 comments
0 members are here

Architectures and Processors blog

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025
Caches and Self-Modifying Code: Working with Threads

Jacob Bramley

How to synchronize JIT-compiled instructions across threads.
- January 21, 2025
Caches and Self-Modifying Code: Implementing `__clear_cache`

Jacob Bramley

How to implement `__clear_cache` using assembly.
- January 20, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

big.LITTLE in 64-bit

Preparing for 64b Operating Systems

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Caches and Self-Modifying Code: Working with Threads

Caches and Self-Modifying Code: Implementing `__clear_cache`