I thought I'd post a short blog post about commonly asked questions on AMBA 4 ACE and system coherency.
What does ACE mean?
ACE is the "AXI Coherency Extensions" introduced with the AMBA 4 specification released in 2011. For those of you thinking "What's AXI?" it's an on-chip bus standard used to define the interface and signalling to connect processors, interconnect, memory controllers to make an SoC.
Why?
ACE allows processors and systems to share memory in a more efficient way called "hardware coherency". Before hardware coherency systems had to rely on software coherency which means that the application, drivers and operating system must carefully manage the sharing of any data between the processor and other system hardware like DMA, graphics and IO interfaces. This software coherency consists of carefully timed cache cleaning, maintenance and invalidations. This cache cleaning takes time and effort (cache contents need to be written out to main memory, DDR), and any mistakes can be very difficult to debug (sometimes data is just in the wrong place and it's not obvious why).
Hardware coherency removes the software challenges, and in fact makes sharing transparent to the application. Hardware coherency is a critical component to @big.LITTLE processing and allows the big and LITTLE processor clusters to see the same view of memory, and run the same operating system. Processes and applications can switch between the big and LITTLE cores as demand requires.
3rd Party Support?
AMBA 4 ACE is an open standard, this means it's freely available to download from the ARM website. Of course there's an ecosystem of EDA companies out there supporting this new standard including Cadence, Jasper, Mentor and Synopsys.
Which Processors?
The latest @Cortex processors all support AMBA 4 ACE, these include the big little pairs: ARMv7 Cortex-A15 & Cortex-A7, and the ARMv8 Cortex-A57 & Cortex-A53. While these processors will be used in big.LITTLE applications we'll also see them used in enterprise applications like networking and servers where hardware coherency is a must have for high performane interfaces like PCI Express, Ethernet and USB.
How do I connect 'ACE' components?
The ARM CoreLink CCI-400 Cache Coherent Interconnect is the first product to market to support AMBA 4 ACE. First released in 2011, CCI-400 has been licensed by over 20 ARM partners and you will see many big.LITTLE products announced during 2013. For those not familiar with SoC architecture, the 'interconnect' is the glue that connects all the building blocks that make up an SoC like Cortex processors, Mali graphics and CoreLink memory controllers.
Any more questions?
Please ask!
Thanks a lot!
Hi Wangyong, great question. We made a change in the most recent release, r1p4 (March 2014), which allows the logic from un-used ports to be removed. There are the following new parameters that allow you to configure the number of ACE-Lite slave ports and the number of ACE-Lite master ports (see table below). With this additional configuration you can reduce the area and power that CCI-400 consumes if some ports are not required.
Hope this helps,
Neil.
Ports
Parameter
Supported Values
ACE slave ports
(fixed)
2
ACE-Lite slave ports
NUM_ACE_LITE_SI
0-3
ACE-Lite master ports
NUM_MI
2-3
Hi Neil,
CCI-400 is fixed configuration: 2 full ACE slave interfaces, 3 ACE-Lite I/O Coherent slave interfaces and 3 master interfaces. So if I only use 2 full ACE slave interfaces and 1 ACE-Lite slave interfaces, are the another 2 ACE-Lite slave interfaces wasted ?
Thanks!
Hi Andy - that's a great question.
At the simplest level it's about performance and complexity, the performance of the bigger Cortex-A cores is much higher, and to reach this high performance they have many parallel requests into the memory system at the same time. This is partly down to the fact that everything is running at a higher frequency, which means more pipe-lining, and in turn more latency. To combat latency we need more transactions in flight, and this requires a more advanced bus like AXI.
A smaller microcontroller based on Cortex-M can get its work done with just 1 request into the system at a time. The frequencies and latencies in the system are lower, and the workload on the processor is much lighter than say a Cortex-A57. Many of the Cortex-M cores will have multiple AHB busses to allow them to run a few transactions in parallel, e.g. data accesses to peripherals in parallel to an instruction fetch on a different port.
I will expand on this in a follow on blog post summarizing the different AMBA standards.Thanks!
This is a great introduction to ACE and its use in big.LITTLE systems.
Can you tell me a little bit more about the key top-level differences between an AXI system and an AHB system and why different processors use different bus standards ?
Thanks
Andy