How do AMBA, CCIX and GenZ address the needs of the data center?

Interconnects and open standards have been a hot topic lately.  A couple of weeks ago, ARM announced the CoreLink CMN-600 Coherent Mesh Network and CoreLink DMC-620 Dynamic Memory Controller IP, which support AMBA 5 CHI, the open standard for high performance coherent on-chip communication. Today, ARM contributed to press announcements from not one, but two new open multi-chip interconnect consortia; CCIX and GenZ.  Each consortium addresses the needs within the data center, which leads to the obvious question:

How do the three different open interconnect standards fit in the data center?

Fundamentally, they are complementary open standards that foster innovation, collaboration and ultimately enable new and emerging use-cases (such as video analytics, machine learning and search acceleration).  Before I dive into the standards and the problems they address, I’d like to highlight the three basic system categories within a data center (SoC, server node and rack) as each have very different interconnect properties and needs.

ARM_SoC.png

System on a Chip (SoC)

A multi-processor general purpose compute SoC is the first thing that comes to mind when thinking of data center. However, there are a number of other data center accelerator and IO SoCs, that are used for intelligent networking, storage or specialized off-load tasks and algorithms (ex: FPGAs, GPUs, DSPs).  An SoC interconnect provides connectivity for on-chip processors, accelerator, IO and memory elements.

server_chassis.png

Server Node

The server node is typically contained within a chassis or blade and will connect a small number of compute, accelerator and IO SoCs .  Today, these SoCs are connected with a simple multichip interconnect (typically PCIe) topology on a PCB motherboard that could also have a small switches and expansion connectors for add-on cards. 

servers-not-so-square.pngRack(s)

Racks not only house a number of server chassis, but also have top-of-rack switches and a large amount of shared storage.  At the rack scale, the interconnect requires scale-out capabilities with complex topologies, which connect 1000’s of server nodes and storage elements.  Ethernet, Infiniband, Fibre Channel and RapidIO are examples of scale-out interconnects.

AMBA – the standard for on-chip communication

AMBA has now been around for over 20 years and has fostered a rich ecosystem built upon open protocol standards. These standards have enabled IP portability, creation and re-use between different design groups and different vendors.  AMBA 5 CHI Coherent Hub Interface protocol specification was announced in 2013 to enable high performance multi-processor heterogeneous SoCs. It has since been used in numerous server, networking and storage SoCs.  AMBA 5 CHI separated the coherency protocol from the transport which enabled free flowing, high frequencies data transfers over flexible on-chip network topologies, making it well suited for scalable data center SoCs.

The following images illustrates how the on-chip CoreLink CMN-600 interconnect can be used to create custom data center SoCs by connecting various compute, accelerator, IO and memory IP elements.

CMN-600_CCIX.png

If you would like to find out more about CHI or other AMBA specifications, please visit the AMBA Protocol developer page.

CCIX - Cache Coherent Interconnect for Accelerators

While AMBA addresses the needs of on-chip communication, a multi-chip standard has much different problems to address ranging from electrical PHYs to mechanical connectors and cables to common software discovery and management.  As noted above, PCIe is the most prevalent server node interconnect and will continue to be widely used, but the lack of coherency is a major drawback.

AddCCIX.png

There are a number of emerging or evolving accelerated use cases such as intelligent network/storage, deep/machine learning, image/video processing and search that are creating demand for more sharing, more bandwidth and lower latency between processors and accelerators.  Hardware cache coherency becomes critical to improving system performance by eliminating software overhead of copying data back and forth and DMA data transfers.  With cache coherency, processors and accelerators can simply make a memory request and the hardware takes care of the rest.

CCIX (pronounced “C6”) provides an open multi-chip coherency standard that allows processors from different vendors with different instruction set architectures and different protocols to extend their cache coherency to remote accelerators.  Now the free flowing, high frequency, AMBA 5 CHI transactions can be converted to CCIX and transferred over flexible multi-chip topologies. To solve the issues introduced with multi-chip connectivity, CCIX has selected PCIe as the first transport.  Leveraging PCIe will dramatically accelerate CCIX deployment and time to market, since it leverages a well-established ecosystem that has already solved the electrical, mechanical, switching and software problems.  It will also simplify the SoC design process by leveraging existing IP and by allowing dual-purpose pins/ports, which can be configured as CCIX or PCIe depending upon which system that are attached within.

For more information about CCIX go to ccixconsortium.com.

GenZ – A new approach to data access

Gen Z.png

While CCIX allows processors to extend their cache coherency to off-chip accelerators, GenZ is addressing the need for higher performance data accesses, with an interconnect based on memory operations that addresses both server node and rack scale.  Today, storage requires block based accesses with complex, code intensive software stacks.  Memory operations such as loads and stores allow processors to access both volatile (ie DRAM) and non-volatile storage in the same efficient manner.  Emerging Storage Class Memory (SCM) and rack level disaggregated

memory pools, are example use-cases that benefit from a memory operation interconnect.

Storage Class Memory

There are a number of new, emerging non-volatile memory technologies, that provide latencies much closer to traditional DDR than today’s SSD devices.  This allows server nodes to not only have a local, persistent memory pool, but also allows for much larger addressable memory per node at lower cost per byte than DDR.

Rack-level disaggregated pooled memory

Big Data analytics demands are not only increasing the amount of memory/storage, but also increasing the demand for real-time processing of larger data sets. Disaggregated memory brings a large pool of low latency, volatile and non-volatile memory to the rack scale. Disaggregated memory also significantly helps the TCO (total cost of ownership) for datacenters, as it allows for better dynamic utilization and allocation of these resources, based on the application demands.

For more information about GenZ go to genzconsortium.org.

Meeting the challenges of new workloads through open standards

Open standards foster innovation, collaboration and ultimately provide businesses more flexibility, performance, efficiency and choice in their technology investments.  Hopefully I’ve been able to help answer the question about how these 3 different open interconnect standards complement each other within the data center.

  • AMBA – the standard for on-chip communication enabling IP portability, creation and re-use
  • CCIX – extends the benefits of cache coherency to the multi-chip server node for evolving acceleration and IO use cases
  • GenZ – enables a new data centric computing approach to big data problems with scalable memory pools and resources at both server node and rack level

In short you need all three to address the very complex world of data center architectures, especially as they evolve to meet the challenges of emerging and new workloads.

I’ll be discussing more about these technologies at my upcoming technology talk during ARM TechCon Oct 25-27, 2016.

Anonymous
Related