PC meets Arm: Integrating PCIExpress into the Arm Server Architecture

Earlier this month you may have noticed some press coverage regarding a collaboration between Xilinx, Arm, Cadence and TSMC to deliver 7nm test chip

There are some significant challenges assembling server SoCs for the infrastructure market with the latest PCIExpress gen4 capabilities, this blog provides a brief overview of the challenges which will be covered in more detail at the Arm Techcon event in California in October. 

A typical Arm Server SoC would look something like the block diagram below.

Typical Arm Server SoC infastructure

Notice the two key external connections via PCIExpress gen4 and CCIX (pronounced “cee-six”). CCIX is a new protocol announced in 2016 by the CCIX Consortium to address the challenge of connecting multiple integrated circuits together coherently. Notice how CCIX uses the PCIe PHY, this is because the CCIX protocol provides a new protocol layer which operates over the top of the PCIe protocol. Adopting this architecture has enabled the CCIX protocol to be defined and implemented in an incredibly short period of time. Take a look at this blog from Jeff Defillipi which explains the rationale for CCIX 'How do AMBA, CCIX and GenZ address the needs of the data center?'.

The key challenges with verifying the integration of gen4 PCIExpress IP into an Arm SoC come about because gen4 has new system-level aspects which impinge on multiple other complex system IP such as System Memory Management Unit (SMMU) and Generic Interrupt Controller (GIC). 

One such example of complex system interaction is the gen4 capability for supporting Address Translation Services (ATS). ATS provides the opportunity to build intelligent End Point devices which have their own Memory Management Unit which is sync’d with the SMMU on the infrastructure host. Translation Lookaside Buffers (TLBs) which provide cached lookup for address translation tables are loaded across the PCIe interface to enhance performance. Traffic can then be sent using “ready translated” addresses which obviate the need for further translation by the SMMU. The diagram below shows the kind of testbench needed to verify ATS.

Example of a typical testbench to verify ATS

The packets initiated from the EP are tagged in such a way that the SMMU just passes them straight through. SW on the host side manages page tables and hence Distributed Virtual Memory (DVM) messages are also passed from the host back to the ATS EP to maintain the TLBs. As you can probably imagine, in order to verify this capability is functioning correctly there needs to be considerable system setup done, a Verification IP is needed which supports ATS, and then SW needs to drive traffic to verify that ATS is working in all it’s various modes. 

One of the most significant challenges for ATS as well as dozens of other integration aspects is the need for complex use-cases. At Arm Techcon we will be introducing a new Portable Stimulus Library (PSS) for PCIExpress which provides significant out-of-the-box capabilities for creating complex PCIExpress use-cases. 

I will be joined by Sujil Kottekkat from Arm to present how we addressed the ATS and other challenges on Tuesday 24th October at Arm Techcon.