Exploring the ARM CoreLink CCI-500 performance envelope - Part 1

February 9, 2015

Introduction

You may have noticed the ARM announcement last week of a group of Premium Mobile products (if not you can find it here ARM Sets New Standard for the Premium Mobile Experience - ARM) covering a new core processor IP, new GPU IP and a new Interconnect IP. While the headlines may belong to the Core and GPU announcements I want to focus on the new Cache Coherent Interconnect, the ARM CoreLink CCI-500. The real-world system performance of any mobile SoC is really determined by the choice of DDR technology and how effectively the SoC architecture can squeeze every last drop of performance out of that DDR.

In this multi-part blog I want to explore how early performance exploration enables users of the CCI-500 to gain valuable insight into the configurability that CCI-500 brings, how the IP behaves under different loading conditions and therefore enable architects, implementers and verification engineers to be better prepared for projects which plan to include CCI-500 in their SoCs.

Configurability

One of the key differences between the previous generation of CCI and the latest is in the configurability of the IP. CCI-500 allows users to more effectively tune the interconnect to match the needs of a broad range of mobile SoCs. For example the number of coherent clusters supported by the AMBA ACE protocol has increased to a maximum of 4 but can also be reduced to 1 for smaller applications. The following table provides a few example CCI-500 configurations that might be commonly used.

Example	Address Width	COHERENT Clusters - ACE	I/O Coherent ACELite	Memory Ports	System Ports
Small	34	2	1	1	1
Large	40	4	3	4	2
Smart Phone	34	2	3	2	1
Tablet	34	2	5	4	2

This configurability obviously allows better matching of CCI configuration with target SoC requirements, however it poses a number of questions. For example what memory bandwidth can a given configuration support? Will adding more ports get me the performance I need? What impact does changing the configuration have?

Exploring these configuration options in a meaningful way requires accurate measurements of the RTL performance of the IP. This is exactly the kind of challenge that the Cadence Interconnect Workbench was architected to address.

Creating a UVM Testbench

Taking the Large example from the previous table the following diagram illustrates the UVM testbench features which are required to start cycle-accurate performance exploration.

As can be seen the testbench needed comprises a number of instances of AMBA VIP to drive each of the 14 interfaces as well as a system scoreboard called Interconnect Validator which tracks transactions through their life-cycle. In addition, test sequences are required to define the shape of AMBA traffic to be injected into the CCI-500 configuration.

The power of Interconnect Workbench is that this potentially tedious, time-consuming and error-prone testbench creation task can be completely automated through a simple spreadsheet.

Below is shown a spreadsheet for a simple configuration (the "small" example in the table) from which a fully working, automatically generated, UVM testbench can be created in a matter of minutes. This simple example is chosen simply to make viewing it in this blog less of an eye test. We have created and tested templates for all the examples listed in the table.

As can be seen, creating this spreadsheet is a much simpler task than writing the 10’s of 1000’s lines of SystemVerilog code by hand.

In the next part of the blog I will present some of the performance results than can be easily extracted using the automated testbench and how different CCI setups can be readily compared to ensure the correct configuration is identified early in your project.

Exploring the ARM CoreLink CCI-500 performance envelope – Part 2

SoC Design and Simulation blog

Understanding Scandump: A key silicon debugging technique

Vincent Yang

Scandump is highly effective in silicon debugging as it can capture most internal states through scan chains, making it invaluable in diagnosing silicon issues.
- June 5, 2024
Introduction to AMBA Viz

Tony Nip

AMBA Viz enables faster debug and performance analysis for cycle-accurate simulation and emulation, even for complex interconnects and AMBA bus protocols.
- May 31, 2024
Arm Virtual Platform co-simulation solution accelerates SoC verification

Daniel Owens

Avery Design Systems’ co-simulation design verification solution that integrates SystemC-based Arm virtual platforms with a SystemVerilog environment.
- December 6, 2022

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

Exploring the ARM CoreLink CCI-500 performance envelope - Part 1

Introduction

Configurability

Creating a UVM Testbench

Understanding Scandump: A key silicon debugging technique

Introduction to AMBA Viz

Arm Virtual Platform co-simulation solution accelerates SoC verification