How to get the most out of the memory interface
The choice of DDR memory is a key factor in determining the cost and power of the smartphone. Low cost smartphones typically use a single 32-bit memory to keep the cost of the product as low as possible. A selection of LPDDR2 operating at 400 MHz or 533 MHz is a common choice. This provides a raw bandwidth of 1.6-2.1 Gbyte/s, from which it is absolutely vital to obtain the maximum utilization in order to deliver sufficient performance from all the processors in the system. There is no spare bandwidth capacity to waste.
The role of the dynamic memory control is key to scheduling DRAM accesses so as not to waste any precious bandwidth. The CoreLink DMC-400 Dynamic Memory Controller has an advanced memory scheduler with separate re-ordering read and write buffers that minimizes time-wasting read-write turn-arounds and maximizes open row and bank hits to reduce activation times. The DMC-400 supports AMBA 4 AXI4 QoS values to prioritize transactions to satisfy the latency and bandwidth needs of each master. Overall the DMC-400 is capable of an average 90% of the maximum theoretical memory utilization across the full range of traffic types.
But keeping each and every master in the system performing is not just about memory bandwidth utilization as different masters have different needs.
End-to-end Quality of Service (QoS)
In a smartphone SoC there are many high performance masters competing for limited access to main memory. End-to-end Quality of Service (QoS) is a system to manage traffic flows through interconnect and memory controller. The QoS mechanism allocates system capacity appropriately to each IP to meet their latency and bandwidth needs. Then allocates excess capacity to where it can offer the most performance improvement.
Fortunately different masters have different QoS needs. Some, like a CPU, have performance related to the latency, so are seeking a minimum latency contract. Some require to get a job done by a deadline, so require maximum latency contracts. Others, like multimedia processors, can process data a long way in advance with many outstanding transactions and are relatively latency tolerant provided they are getting a minimum bandwidth contract.
Fig 2: Different masters with different QoS needs
In a system without traffic management there will be a tendency for the masters that put out the most outstanding transactions to “win”. Their transactions will flood the memory controller queue and back up through the arbitration points in the interconnect, leaving the other masters blocked. The ARM CoreLink IP offers two mechanism to resolve this.
Firstly the addition of regulators at the ingress to the interconnect system, so that greedy masters can be limited by bandwidth, number of outstanding transactions or period between transactions. Regulators can be dynamic to react to the average bandwidth or latency that the master is achieving, raising priority when insufficient bandwidth or excessive latency occurs. This does help but has the disadvantage that over regulation may occur in order to prevent any blocking, which may lead to under utilization of the memory bandwidth, so not achieving the best possible performance.
A second mechanism called QoS Virtual Networks is introduced that allows the regulation and priority mechanism to work independently for different master types: minimum latency, maximum latency and minimum bandwidth. In effect those greedy masters are allowed to use all bandwidth up until a high tide mark where the last bit is reserved for the latency critical masters. Within the greedy master virtual network regulators can be used to share the bandwidth, while still allowing the CPU and real time masters to have an unblocked route through the interconnect to the memory controller queue. Once in the queue, a higher priority QoS value can be used to jump the queue and meet the minimum or maximum latency contract as appropriate.
So we see that my getting the most out of your DDR memory interface you can keep costs and power down while still delighting the consumer with the rich user-experience they expect from today's best sellling smartphones.
In the next and final installment I'll be looking at reducing software costs and the options for CPU-GPU coherency in a low cost smartphone.