Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog White paper: Optimizing Performance for an ARM Mobile Memory Subsystem
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • mobile
  • White Paper
  • Corelink
  • Mali
  • bandwidth
  • latency
  • performance
  • CoreLink CCI-500
  • mimir
  • cci
  • dmc-500
  • soc
  • gpu
  • dmc
  • Corelink CCI-550
  • coherency
  • memory_controller
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

White paper: Optimizing Performance for an ARM Mobile Memory Subsystem

Ashwin Matta
Ashwin Matta
February 16, 2016
1 minute read time.

Introduction

Contemporary mobile platform SoCs impose intense traffic management demands on the memory subsystem. An intelligent memory controller design comprehends the fundamental memory streaming requirements of a mobile SoC and provides the necessary capabilities for optimal Quality of Service (QoS) while ensuring best use of available memory bandwidth. This paper describes some of the performance challenges for memory subsystems in an ARM-based mobile SoC*. Memory controller features necessary for optimizing performance of mobile traffic are described along with their effects, using benchmarking data. Moreover, the combined effect of optimizing memory subsystem performance by closely integrating both the memory controller and the interconnect fabric is demonstrated.

Contemporary example of an ARM-based mobile subsystem

ARM Mobile Subsystem Example

Figure 1 shows a contemporary example of an ARM-based mobile subsystem. Typically, there are one or two clusters of Cortex-A processors in big.LITTLE™ configuration – with the ‘big’ CPUs handling the raw computational needs whereas the ‘LITTLE’ ones running the lighter threads for power efficiency. The CPUs seamlessly communicate data with each other over a Cache Coherent Interconnect, CoreLink CCI-550, that provides a snoop filter for storing a directory of cached data, thereby reducing number of snoops required across CPU clusters. In addition to the CPUs, graphics, video and display computations are performed by the fully coherent Mali Mimir GPU and non-coherent V550 Video and DP650 Display processors in the system.

 
DMC Performance Optimization for Mobile Memory Subsystem.pdf
Anonymous
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025