big.LITTLE and AMBA 4 ACE keep your cache warm and avoid flushes

Updated 29th October 2013

High performance and power efficiency are critical to the latest mobile devices, and AMBA® 4 ACE™ is a fundamental technology supporting ARM's big.LITTLE processing. In case you missed the announcements, the big.LITTLE technology offers an innovative way to run the "always on' tasks on the highly efficient Cortex™-A7 processor, while the high performance and responsive applications are predominantly executed on the Cortex™-A15 processor. So what does this have to do with AMBA 4? Well AMBA 4 ACE and the CoreLink™ CCI-400 Cache Coherent Interconnect offer the critical glue to join these processors together into a big.LITTLE multi-processing (MP) system. Let me explain...

  In 2011 ARM announced the public release of the AMBA 4 phase 2 specification including ACE, or AXI Coherency Extensions. This new specification, supported by Cortex-A15, Cortex-A7 and Mali™-T600 series, allows hardware managed coherency and cache sharing between these processor cores. Sharing a workload across multiple cores offers greater performance and power efficiency. Without hardware coherency the software is responsible for cache maintenance including cleaning, flushing and invalidating caches. This takes significant processing cycles and energy as data is cleaned out from caches to external memory. All system architects know that external memory accesses are nearly always higher latency and higher power than on chip memory, that's why we have caches. The hardware coherency introduced with AMBA 4 ACE allows the different processing engines to view each other's caches and removes or reduces the need for the cache maintenance operations.

  big.LITTLE MP allows a single chip to contain two very different sized, but fully code-compatible, processors to share the processing workload. In this case the Cortex-A7 can run the less demanding always on activities, and the Cortex-A15 is called in to service the more demanding applications like a web page render. AMBA 4 ACE allows these processors to see the same view of memory, including any shared workloads. Further, a process running on the small core can migrate quickly to the large core as demand requires. The hardware coherency ensures that any cached data in the small core can be passed seamlessly to the large core without having to access external memory. Without this hardware coherency the system would need to stop, clean caches from the small core to main memory, then once complete, start the big core. This would take time and energy.

  CoreLink CCI-400 Cache Coherent Interconnect is ARM's first implementation of AMBA 4 ACE and has been designed from the start to support big.LITTLE. It has two full ACE ports for the processor clusters, supporting up to quad core Cortex-A15 or Cortex-A7 on each port; and three ACE-Lite™ ports for I/O coherent devices like the Mali-T604 and Mali-T658. This I/O coherency allows the GPU to read shared data from either the big or LITTLE core depending on which is running (or even both!). This could benefit a range of real world use cases such as the User Interface (UI) running on the little core with acceleration from the GPU, through to high performance gaming with the big core and GPU sharing the game engine, physics and rendering responsibilities. In either case a seamless, efficient view of shared memory offers the high performance and power efficiency demanded by future mobile devices and GPU computing. Hardware coherency reduces the need for cache cleaning and invalidating when sharing data with these I/O devices and this can improve I/O performance and simplify software.

  At a system level CCI-400 is designed to integrate seamlessly with other CoreLink 400 IP including, DMC-400 for dual channel LPDDR2/DDR2/DDR3, MMU-400 for system virtualisation and NIC-400 for connecting the rest of the system peripherals and controllers with minimal routing and cost. Throughout the CCI-400 design process, ARM's interconnect team has been working closely with the processor and graphics teams to ensure that the CCI-400 offers the right balance of system performance and power efficiency for big.LITTLE to make the best of ARM's joined-up story. More recently ARM has announced the Cortex-A57 and Cortex-A53 processors which offer a new big.LITTLE pairing with support for the ARMv8-A architecture. These processors are fully supported by CoreLink CCI-400 and offer new opportunities for mobile devices.

  For more information on CoreLink CCI-400, AMBA 4 ACE and big.LITTLE check out the following: