It’s (Mostly) in the PHY

March 20, 2015

4 minute read time.

The memory sub-system is one of the most complex systems in a SoC, critical for overall performance of the chip. Recent years have witnessed explosive growth in the memory market with high-speed parts (DDR4/3 with/without DIMM support, LPDDR4/3) gaining momentum in mobile, consumer and enterprise systems. This has not only resulted in increasingly complex memory controllers (MC) but also PHYs that connect the memory sub-system to the external DRAM. Due to the high-speed transfer of data between the SoC and DRAM, it is necessary to perform complex training of memory interface signals for best operation.

Traditionally, MC and PHY integration was considered to be a significant challenge, especially if the two IP blocks originated from different vendors. The key reason was the rapid evolution of memory protocols and DFI interface boundary between controller and PHY being incompletely specified, or in some cases ambiguous, with respect to requirements for MC-PHY training.

Why is MC-PHY integration NOT such a big issue now?

I’ll try to shed some light on this topic. Recently, with the release of the DFI 4.0 draft specification for MC-PHY interface, things certainly seem to be heading in the right direction. For folks unfamiliar with DFI, this is an industry standard that defines the boundary signals and protocol between any generic MC and PHY. Since the inception of DFI 1.0 back in 2006, the specification has steadily advanced to cover all aspects of MC-PHY operation encompassing all relevant DRAM technology requirements. The DFI 4.0 specification is more mature compared to previous releases and specifically focuses on backwards compatibility and MC-PHY interoperability.

But that’s not the only reason why MC-PHY integration has gotten easier. To understand this better, we need to examine how MC and PHY interact during training. There are 2 fundamental ways that training of memory signals can happen:

PHY evaluation mode or DFI Training mode - This training mode can be initiated either by the PHY or the MC. Regardless of which side initiates the training, the MC sets up the DRAM for gate/read data eye/write/CA training and periodically issues training commands such as reads or writes. The PHY is responsible for determining the correct delay programming for each operation but the MC has to assist by enabling and disabling the leveling logic in the DRAMs and the PHY, and by generating the necessary read, mode register read or write strobe commands. The DFI training mode thus imposes a significant effort on the MC, and was mandatory in the earlier DFI 3.1 specification. However, in DFI 4.0, this training mode has become optional for MC.
PHY independent mode – This is a mode where the PHY performs DRAM training with little involvement from the MC. The PHY generates all the read/write commands and programs the delays for each operation while the MC waits patiently for ‘done’ status. In DFI 3.1, the PHY independent mode could be initiated through a DFI update request protocol or through a special non-DFI training mode. In DFI 4.0, the non-DFI training mode has been replaced with a more generalized PHY master interface that enables the PHY to train the memory signals along with other low power operations.

Interestingly, PHY IP providers have decided to take ownership of training by implementing support for PHY independent mode in their IP, thereby keeping the reins to optimize the PHY training algorithms based on their PHY architecture. With PHY complexity growing and challenges with closing timing at high DDR speeds, the support for PHY independent mode training adds a valuable differentiator for PHY IP providers.

What is the memory controller’s role during PHY-independent mode training?

With the PHY doing most of the heavy lifting during the training, the MC only needs to focus on two questions:

When does a request for training happen?
In what state does memory need to be when handing control to the PHY for training, and in what state will memory be when PHY hands the control back to the MC?

The MC thus deals with the PHYs request for independent-mode training as an interrupt, something it needs to schedule along with a multitude of other things that it does for best memory operation. Training thus becomes a Quality-of-Service (QoS) exercise for the controller with a different set of parameters to optimize. The positive about all this is that QoS is essentially what a good MC does very well.

But what about proof of working silicon for MC-PHY integration?

With the clarity at the DFI interface, silicon proof is really a burden on the PHY because it has to train correctly at high speeds and provide a good data eye. Risk for critical bugs in MC that can only be found through silicon proof is low, something that a strong architecture and design/verification methodology can help eliminate. So the demands on MC have become less on MC-PHY interoperability, but more so on performance (memory bandwidth and latency).

Does your MC operate with the best performance in a realistic system scenario with guaranteed QoS, shortest path to memory, guaranteed response to real-time traffic, etc.?

I am leaving that as the topic of my next blog.

ARM is building state-of-the-art memory controllers with emphasis on CPU-to-memory performance, and supporting DFI-based PHY solutions available in the market today. We have setup partnerships with 3^rd party PHY providers for ensuring that integration at the DFI boundary is seamless for the end customer. ARM’s controllers support all the different training modes used by different PHYs thereby providing customers flexibility in choosing the best overall solution for their memory sub-system deployment.

Thanks for reading my blog, I welcome your feedback.

Architectures and Processors blog

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Chris Walsh

Arm is pleased to announce that the Arm Intrinsics Search Engine has been updated to include the Scalable Matrix Extension (SME) intrinsics, including both SME and SME2 intrinsics.
- October 3, 2025
Arm A-Profile Architecture developments 2025

Martin Weidmann

Each year, Arm publishes updates to the A-Profile architecture alongside full Instruction Set and System Register documentation. In 2025, the update is Armv9.7-A.
- October 2, 2025
When a barrier does not block: The pitfalls of partial order

Wathsala Vithanage

Acquire fences aren’t always enough. See how LDAPR exposed unsafe interleavings and what we did to patch the problem.
- September 15, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

It’s (Mostly) in the PHY

Why is MC-PHY integration NOT such a big issue now?

What is the memory controller’s role during PHY-independent mode training?

But what about proof of working silicon for MC-PHY integration?

Does your MC operate with the best performance in a realistic system scenario with guaranteed QoS, shortest path to memory, guaranteed response to real-time traffic, etc.?

Scalable Matrix Extension: Expanding the Arm Intrinsics Search Engine

Arm A-Profile Architecture developments 2025

When a barrier does not block: The pitfalls of partial order