Contemporary mobile platform SoCs impose intense traffic management demands on the memory subsystem. An intelligent memory controller design comprehends the fundamental memory streaming requirements of a mobile SoC and provides the necessary capabilities for optimal Quality of Service (QoS) while ensuring best use of available memory bandwidth. This paper describes some of the performance challenges for memory subsystems in an ARM-based mobile SoC*. Memory controller features necessary for optimizing performance of mobile traffic are described along with their effects, using benchmarking data. Moreover, the combined effect of optimizing memory subsystem performance by closely integrating both the memory controller and the interconnect fabric is demonstrated.
Figure 1 shows a contemporary example of an ARM-based mobile subsystem. Typically, there are one or two clusters of Cortex-A processors in big.LITTLE™ configuration – with the ‘big’ CPUs handling the raw computational needs whereas the ‘LITTLE’ ones running the lighter threads for power efficiency. The CPUs seamlessly communicate data with each other over a Cache Coherent Interconnect, CoreLink CCI-550, that provides a snoop filter for storing a directory of cached data, thereby reducing number of snoops required across CPU clusters. In addition to the CPUs, graphics, video and display computations are performed by the fully coherent Mali Mimir GPU and non-coherent V550 Video and DP650 Display processors in the system.