System design tips for entry-level smartphones - part 2

October 21, 2013

3 minute read time.

How to get the most out of the memory interface

The choice of DDR memory is a key factor in determining the cost and power of the smartphone. Low cost smartphones typically use a single 32-bit memory to keep the cost of the product as low as possible. A selection of LPDDR2 operating at 400 MHz or 533 MHz is a common choice. This provides a raw bandwidth of 1.6-2.1 Gbyte/s, from which it is absolutely vital to obtain the maximum utilization in order to deliver sufficient performance from all the processors in the system. There is no spare bandwidth capacity to waste.

The role of the dynamic memory control is key to scheduling DRAM accesses so as not to waste any precious bandwidth. The CoreLink DMC-400 Dynamic Memory Controller has an advanced memory scheduler with separate re-ordering read and write buffers that minimizes time-wasting read-write turn-arounds and maximizes open row and bank hits to reduce activation times. The DMC-400 supports AMBA 4 AXI4 QoS values to prioritize transactions to satisfy the latency and bandwidth needs of each master. Overall the DMC-400 is capable of an average 90% of the maximum theoretical memory utilization across the full range of traffic types.

But keeping each and every master in the system performing is not just about memory bandwidth utilization as different masters have different needs.

End-to-end Quality of Service (QoS)

In a smartphone SoC there are many high performance masters competing for limited access to main memory. End-to-end Quality of Service (QoS) is a system to manage traffic flows through interconnect and memory controller. The QoS mechanism allocates system capacity appropriately to each IP to meet their latency and bandwidth needs. Then allocates excess capacity to where it can offer the most performance improvement.

Fortunately different masters have different QoS needs. Some, like a CPU, have performance related to the latency, so are seeking a minimum latency contract. Some require to get a job done by a deadline, so require maximum latency contracts. Others, like multimedia processors, can process data a long way in advance with many outstanding transactions and are relatively latency tolerant provided they are getting a minimum bandwidth contract.

Fig 2: Different masters with different QoS needs

In a system without traffic management there will be a tendency for the masters that put out the most outstanding transactions to “win”. Their transactions will flood the memory controller queue and back up through the arbitration points in the interconnect, leaving the other masters blocked. The ARM CoreLink IP offers two mechanism to resolve this.

Firstly the addition of regulators at the ingress to the interconnect system, so that greedy masters can be limited by bandwidth, number of outstanding transactions or period between transactions. Regulators can be dynamic to react to the average bandwidth or latency that the master is achieving, raising priority when insufficient bandwidth or excessive latency occurs. This does help but has the disadvantage that over regulation may occur in order to prevent any blocking, which may lead to under utilization of the memory bandwidth, so not achieving the best possible performance.

A second mechanism called QoS Virtual Networks is introduced that allows the regulation and priority mechanism to work independently for different master types: minimum latency, maximum latency and minimum bandwidth. In effect those greedy masters are allowed to use all bandwidth up until a high tide mark where the last bit is reserved for the latency critical masters. Within the greedy master virtual network regulators can be used to share the bandwidth, while still allowing the CPU and real time masters to have an unblocked route through the interconnect to the memory controller queue. Once in the queue, a higher priority QoS value can be used to jump the queue and meet the minimum or maximum latency contract as appropriate.

Fig. 3 End-to-end QoS with Virtual Networks

So we see that my getting the most out of your DDR memory interface you can keep costs and power down while still delighting the consumer with the rich user-experience they expect from today's best sellling smartphones.

In the next and final installment I'll be looking at reducing software costs and the options for CPU-GPU coherency in a low cost smartphone.

0 comments
0 members are here

Architectures and Processors blog

Introducing GICv5: Scalable and secure interrupt management for Arm

Christoffer Dall

Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
- April 28, 2025
Getting started with AARCHMRS Features.json using Python

Joh

A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
- April 8, 2025
Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

Samer El-Haj-Mahmoud

Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
- January 28, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

System design tips for entry-level smartphones - part 2

Fig. 3 End-to-end QoS with Virtual Networks

Introducing GICv5: Scalable and secure interrupt management for Arm

Getting started with AARCHMRS Features.json using Python

Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC