How do AMBA, CCIX and Gen-Z address the needs of the data center?

October 11, 2016

5 minute read time.

Interconnects and open standards have been a hot topic lately. A couple of weeks ago, Arm announced the CoreLink CMN-600 Coherent Mesh Network and CoreLink DMC-620 Dynamic Memory Controller IP, which support AMBA 5 CHI, the open standard for high performance coherent on-chip communication. Today, Arm contributed to press announcements from not one, but two new open multi-chip interconnect consortia; CCIX and Gen-Z. Each consortium addresses the needs within the data center, which leads to the obvious question:

How do the three different open interconnect standards fit in the data center?

Fundamentally, they are complementary open standards that foster innovation, collaboration and ultimately enable new and emerging use-cases (such as video analytics, machine learning and search acceleration). Before I dive into the standards and the problems they address, I’d like to highlight the three basic system categories within a data center (SoC, server node and rack) as each have very different interconnect properties and needs.

System on a Chip (SoC)

A multiprocessor general purpose compute SoC is the first thing that comes to mind when thinking of data center. However, there are a number of other data center accelerator and IO SoCs, that are used for intelligent networking, storage or specialized off-load tasks and algorithms (ex: FPGAs, GPUs, DSPs). A SoC interconnect provides connectivity for on-chip processors, accelerator, IO and memory elements.

Server Node

The server node is typically contained within a chassis or blade and will connect a small number of compute, accelerator and IO SoCs. Today, these SoCs are connected with a simple multichip interconnect (typically PCIe) topology on a PCB motherboard that could also have a small switches and expansion connectors for add-on cards.

Rack(s)

Racks not only house a number of server chassis, but also have top-of-rack switches and a large amount of shared storage. At the rack scale, the interconnect requires scale-out capabilities with complex topologies, which connect thousands of server nodes and storage elements. Ethernet, Infiniband, Fibre Channel and RapidIO are examples of scale-out interconnects.

AMBA – the standard for on-chip communication

AMBA has now been around for over 20 years and has fostered a rich ecosystem built upon open protocol standards. These standards have enabled IP portability, creation and re-use between different design groups and different vendors. AMBA 5 CHI Coherent Hub Interface protocol specification was announced in 2013 to enable high performance multi-processor heterogeneous SoCs. It has since been used in numerous server, networking and storage SoCs. AMBA 5 CHI separated the coherency protocol from the transport which enabled free flowing, high frequencies data transfers over flexible on-chip network topologies, making it well suited for scalable data center SoCs.

The following images illustrates how the on-chip CoreLink CMN-600 interconnect can be used to create custom data center SoCs by connecting various compute, accelerator, IO and memory IP elements.

on-chip CoreLink CMN-600 interconnect creates custom data center SoCs

If you would like to find out more about CHI or other AMBA specifications, please visit the AMBA Protocol developer page.

CCIX - Cache Coherent Interconnect for Accelerators

While AMBA addresses the needs of on-chip communication, a multi-chip standard has much different problems to address ranging from electrical PHYs to mechanical connectors and cables to common software discovery and management. As noted above, PCIe is the most prevalent server node interconnect and will continue to be widely used, but the lack of coherency is a major drawback.

Server node with shared address space

There are a number of emerging or evolving accelerated use cases such as intelligent network/storage, deep/machine learning, image/video processing and search that are creating demand for more sharing, more bandwidth and lower latency between processors and accelerators. Hardware cache coherency becomes critical to improving system performance by eliminating software overhead of copying data back and forth and DMA data transfers. With cache coherency, processors and accelerators can simply make a memory request and the hardware takes care of the rest.

CCIX (pronounced “C6”) provides an open multi-chip coherency standard that allows processors from different vendors with different instruction set architectures and different protocols to extend their cache coherency to remote accelerators. Now the free flowing, high frequency, AMBA 5 CHI transactions can be converted to CCIX and transferred over flexible multi-chip topologies. To solve the issues introduced with multi-chip connectivity, CCIX has selected PCIe as the first transport. Leveraging PCIe will dramatically accelerate CCIX deployment and time to market, since it leverages a well-established ecosystem that has already solved the electrical, mechanical, switching and software problems. It will also simplify the SoC design process by leveraging existing IP and by allowing dual-purpose pins/ports, which can be configured as CCIX or PCIe depending upon which system that are attached within.

See more information about CCIX.

GenZ – A new approach to data access

Data center rack

While CCIX allows processors to extend their cache coherency to off-chip accelerators, GenZ is addressing the need for higher performance data accesses, with an interconnect based on memory operations that addresses both server node and rack scale. Today, storage requires block based accesses with complex, code intensive software stacks. Memory operations such as loads and stores allow processors to access both volatile (ie DRAM) and non-volatile storage in the same efficient manner. Emerging Storage Class Memory (SCM) and rack level disaggregated memory pools, are example use-cases that benefit from a memory operation interconnect.

Storage Class Memory

There are a number of new, emerging non-volatile memory technologies, that provide latencies much closer to traditional DDR than today’s SSD devices. This allows server nodes to not only have a local, persistent memory pool, but also for much larger addressable memory per node at lower cost per byte than DDR.

Rack-level disaggregated pooled memory

Big Data analytics demands are not only increasing the amount of memory/storage, but also increasing the demand for real-time processing of larger data sets. Disaggregated memory brings a large pool of low latency, volatile and non-volatile memory to the rack scale. Disaggregated memory also significantly helps the TCO (total cost of ownership) for datacenters, as it allows for better dynamic utilization and allocation of these resources, based on the application demands.

See more information about Gen-Z.

Meeting the challenges of new workloads through open standards

Open standards foster innovation, collaboration and ultimately provide businesses more flexibility, performance, efficiency and choice in their technology investments. Hopefully I’ve been able to help answer the question about how these 3 different open interconnect standards complement each other within the data center.

AMBA – the standard for on-chip communication enabling IP portability, creation and re-use
CCIX – extends the benefits of cache coherency to the multi-chip server node for evolving acceleration and IO use cases
GenZ – enables a new data centric computing approach to big data problems with scalable memory pools and resources at both server node and rack level

In short you need all three to address the very complex world of data center architectures, especially as they evolve to meet the challenges of emerging and new workloads.

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog