Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
  • Groups
    • Arm Research
    • DesignStart
    • Education Hub
    • Graphics and Gaming
    • High Performance Computing
    • Innovation
    • Multimedia
    • Open Source Software and Platforms
    • Physical
    • Processors
    • Security
    • System
    • Software Tools
    • TrustZone for Armv8-M
    • 中文社区
  • Blog
    • Artificial Intelligence
    • Automotive
    • Healthcare
    • HPC
    • Infrastructure
    • Innovation
    • Internet of Things
    • Machine Learning
    • Mobile
    • Smart Homes
    • Wearables
  • Forums
    • All developer forums
    • IP Product forums
    • Tool & Software forums
    • Pelion IoT Platform
  • Activity
  • Support
    • Open a support case
    • Documentation
    • Downloads
    • Training
    • Arm Approved program
    • Arm Design Reviews
  • More
  • Cancel
Processors
  • Developer Community
  • IP Products
  • Processors
  • Jump...
  • Cancel
Processors
Processors blog AMBA 4 ACE and Hardware Cache Coherency - Top 5 Questions
  • Blogs
  • Leaderboard
  • Forums
  • Videos & Files
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
  • New
More blogs in Processors
  • DesignStart blog

  • Machine Learning IP blog

  • Processors blog

  • TrustZone for Armv8-M blog

Tell us what you think
Tags
  • Cortex-A53
  • AMBA
  • Cortex-A57
  • Corelink
  • Cortex-A15
  • CoreLink CCI-400
  • ACE
  • Cortex-A
  • Cortex-A7
  • coherency
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

AMBA 4 ACE and Hardware Cache Coherency - Top 5 Questions

Neil Parris
Neil Parris
October 14, 2013

I thought I'd post a short blog post about commonly asked questions on AMBA 4 ACE and system coherency.

What does ACE mean?

ACE is the "AXI Coherency Extensions" introduced with the AMBA 4 specification released in 2011. For those of you thinking "What's AXI?" it's an on-chip bus standard used to define the interface and signalling to connect processors, interconnect, memory controllers to make an SoC.

 

Why?

ACE allows processors and systems to share memory in a more efficient way called "hardware coherency". Before hardware coherency systems had to rely on software coherency which means that the application, drivers and operating system must carefully manage the sharing of any data between the processor and other system hardware like DMA, graphics and IO interfaces. This software coherency consists of carefully timed cache cleaning, maintenance and invalidations. This cache cleaning takes time and effort (cache contents need to be written out to main memory, DDR), and any mistakes can be very difficult to debug (sometimes data is just in the wrong place and it's not obvious why).

Hardware coherency removes the software challenges, and in fact makes sharing transparent to the application. Hardware coherency is a critical component to @big.LITTLE processing and allows the big and LITTLE processor clusters to see the same view of memory, and run the same operating system. Processes and applications can switch between the big and LITTLE cores as demand requires.

3rd Party Support?

AMBA 4 ACE is an open standard, this means it's freely available to download from the ARM website. Of course there's an ecosystem of EDA companies out there supporting this new standard including Cadence, Jasper, Mentor and Synopsys.

Which Processors?

The latest @Cortex processors all support AMBA 4 ACE, these include the big little pairs: ARMv7 Cortex-A15 & Cortex-A7, and the ARMv8 Cortex-A57 & Cortex-A53. While these processors will be used in big.LITTLE applications we'll also see them used in enterprise applications like networking and servers where hardware coherency is a must have for high performane interfaces like PCI Express, Ethernet and USB.

How do I connect 'ACE' components?

The ARM CoreLink CCI-400 Cache Coherent Interconnect is the first product to market to support AMBA 4 ACE. First released in 2011, CCI-400 has been licensed by over 20 ARM partners and you will see many big.LITTLE products announced during 2013. For those not familiar with SoC architecture, the 'interconnect' is the glue that connects all the building blocks that make up an SoC like Cortex processors, Mali graphics and CoreLink memory controllers.

Any more questions?

Please ask!

Anonymous
  • wangyong
    Offline wangyong over 6 years ago

    Thanks a lot!

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Neil Parris
    Offline Neil Parris over 6 years ago

    Hi Wangyong, great question. We made a change in the most recent release, r1p4 (March 2014), which allows the logic from un-used ports to be removed. There are the following new parameters that allow you to configure the number of ACE-Lite slave ports and the number of ACE-Lite master ports (see table below). With this additional configuration you can reduce the area and power that CCI-400 consumes if some ports are not required.

    Hope this helps,

    Neil.

    Ports

    Parameter

    Supported Values

    ACE slave ports

    (fixed)

    2

    ACE-Lite slave ports

    NUM_ACE_LITE_SI

    0-3

    ACE-Lite master ports

    NUM_MI

    2-3

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • wangyong
    Offline wangyong over 6 years ago

    Hi Neil,

    CCI-400 is fixed configuration: 2 full ACE slave interfaces, 3 ACE-Lite I/O Coherent slave interfaces and 3 master interfaces. So if I only use 2 full ACE slave interfaces and 1 ACE-Lite slave interfaces, are the another 2 ACE-Lite slave interfaces wasted ?

    Thanks!

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Neil Parris
    Offline Neil Parris over 6 years ago

    Hi Andy - that's a great question.

    At the simplest level it's about performance and complexity, the performance of the bigger Cortex-A cores is much higher, and to reach this high performance they have many parallel requests into the memory system at the same time. This is partly down to the fact that everything is running at a higher frequency, which means more pipe-lining, and in turn more latency. To combat latency we need more transactions in flight, and this requires a more advanced bus like AXI.

    A smaller microcontroller based on Cortex-M can get its work done with just 1 request into the system at a time. The frequencies and latencies in the system are lower, and the workload on the processor is much lighter than say a Cortex-A57. Many of the Cortex-M cores will have multiple AHB busses to allow them to run a few transactions in parallel, e.g. data accesses to peripherals in parallel to an instruction fetch on a different port.

    I will expand on this in a follow on blog post summarizing the different AMBA standards.
    Thanks!

    Neil.

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Andy Frame
    Offline Andy Frame over 6 years ago

    Hi Neil,

    This is a great introduction to ACE and its use in big.LITTLE systems.

    Can you tell me a little bit more about the key top-level differences between an AXI system and an AHB system and why different processors use different bus standards ?

    Thanks

    Andy

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Processors blog
  • Processors blog: How to generate litmus tests automatically with the diy7 tool

    Jade Alglave
    Jade Alglave
    The second tutorial on using the Memory Model Tool, this blog offers a working example of how to generate litmus tests automatically with the diy7 tool.
    • June 11, 2020
  • Processors blog: Introducing the Arm Cortex-X Custom Program

    Stefan Rosinger
    Stefan Rosinger
    Read this introduction to the Arm Cortex-X Custom Program, outlining what the new program entails and providing details about the new Arm Cortex-X1 CPU which is part of the program.
    • May 26, 2020
  • Processors blog: Arm Cortex-A78 CPU: Sustained Performance for Greater Digital Immersion

    Vincent Risson
    Vincent Risson
    This blog explores the key features and benefits of the Arm Cortex-A78 CPU.
    • May 26, 2020