Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
SoC Design and Simulation blog Exploring the ARM CoreLink CCI-500 performance envelope - Part 1
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Cadence Design Systems
  • Cache coherency
  • CoreLink CCI-400
  • CoreLink CCI-500
  • corelink interconnect
  • performance analysis
  • Cache
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Exploring the ARM CoreLink CCI-500 performance envelope - Part 1

Nick
Nick
February 9, 2015
3 minute read time.

Introduction

You may have noticed the ARM announcement last week of a group of Premium Mobile products (if not you can find it here ARM Sets New Standard for the Premium Mobile Experience - ARM) covering a new core processor IP, new GPU IP and a new Interconnect IP. While the headlines may belong to the Core and GPU announcements I want to focus on the new Cache Coherent Interconnect, the ARM CoreLink CCI-500. The real-world system performance of any mobile SoC is really determined by the choice of DDR technology and how effectively the SoC architecture can squeeze every last drop of performance out of that DDR.

In this multi-part blog I want to explore how early performance exploration enables users of the CCI-500 to gain valuable insight into the configurability that CCI-500 brings, how the IP behaves under different loading conditions and therefore enable architects, implementers and verification engineers to be better prepared for projects which plan to include CCI-500 in their SoCs.

Configurability

One of the key differences between the previous generation of CCI and the latest is in the configurability of the IP. CCI-500 allows users to more effectively tune the interconnect to match the needs of a broad range of mobile SoCs. For example the number of coherent clusters supported by the AMBA ACE protocol has increased to a maximum of 4 but can also be reduced to 1 for smaller applications. The following table provides a few example CCI-500 configurations that might be commonly used.

Example Address Width COHERENT Clusters - ACE I/O Coherent ACELite Memory Ports System Ports
Small 34 2 1 1 1
Large 40 4 3 4 2
Smart Phone 34 2 3 2 1
Tablet 34 2 5 4 2

This configurability obviously allows better matching of CCI configuration with target SoC requirements, however it poses a number of questions. For example what memory bandwidth can a given configuration support? Will adding more ports get me the performance I need? What impact does changing the configuration have?

Exploring these configuration options in a meaningful way requires accurate measurements of the RTL performance of the IP. This is exactly the kind of challenge that the Cadence Interconnect Workbench was architected to address.

Creating a UVM Testbench

Taking the Large example from the previous table the following diagram illustrates the UVM testbench features which are required to start cycle-accurate performance exploration.

Screen Shot 02-10-15 at 07.05 AM.JPG

As can be seen the testbench needed comprises a number of instances of AMBA VIP to drive each of the 14 interfaces as well as a system scoreboard called Interconnect Validator which tracks transactions through their life-cycle. In addition, test sequences are required to define the shape of AMBA traffic to be injected into the CCI-500 configuration.

The power of Interconnect Workbench is that this potentially tedious, time-consuming and error-prone testbench creation task can be completely automated through a simple spreadsheet.

Below is shown a spreadsheet for a simple configuration (the "small" example in the table) from which a fully working, automatically generated, UVM testbench can be created in a matter of minutes. This simple example is chosen simply to make viewing it in this blog less of an eye test. We have created and tested templates for all the examples listed in the table.

SmallConfigXLScreenShot.JPG

As can be seen, creating this spreadsheet is a much simpler task than writing the 10’s of 1000’s lines of SystemVerilog code by hand.

In the next part of the blog I will present some of the performance results than can be easily extracted using the automated testbench and how different CCI setups can be readily compared to ensure the correct configuration is identified early in your project.

Exploring the ARM CoreLink CCI-500 performance envelope – Part 2


Anonymous
SoC Design and Simulation blog
  • Performance verification with AMBA Viz

    Tony Nip
    Tony Nip
    Run consistent latency and bandwidth checks on CMN interconnects using AMBA Viz’s new performance script—no API expertise needed.
    • June 30, 2025
  • Understanding Scandump: A key silicon debugging technique

    Vincent Yang
    Vincent Yang
    Scandump is highly effective in silicon debugging as it can capture most internal states through scan chains, making it invaluable in diagnosing silicon issues.
    • June 5, 2024
  • Introduction to AMBA Viz

    Tony Nip
    Tony Nip
    AMBA Viz enables faster debug and performance analysis for cycle-accurate simulation and emulation, even for complex interconnects and AMBA bus protocols.
    • May 31, 2024