Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Architectures and Processors blog System IP for 2016 Premium Mobile Systems
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Corelink
  • corelink_system_ip
  • premium_mobile
  • Mali
  • Embedded
  • cotex-a
  • cortex
  • system_ip
  • CoreSight
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

System IP for 2016 Premium Mobile Systems

Andy Nightingale
Andy Nightingale
February 12, 2015
9 minute read time.

First announced in November 2011, the ARMv8-A architecture and Cortex-A57 and Cortex-A53 processors have cumulatively amassed over 50 licensing agreements. This momentum is particularly strong in manufacturers of application processors targeted at smartphones, with all top-10 players having adopted ARMv8-A. This adoption is set to continue throughout 2015 as premium mobile makers seek to harness the increased potential that the upgrade in system architecture offers. What that means for consumers is devices that are fluid and responsive when handling all of the complex tasks demanded of modern smartphones and tablets.

With this blog I will go through some aspects of the system that make a significant contribution to the increase in performance associated with the shift from 32 to 64-bit.

I don’t think I am alone in thinking that the main drivers in the premium mobile device market are human experiences and expectations. The demand for better user experiences on higher resolution displays whilst retaining fluid responses, with more device-to-device connectivity means that consumers are looking for the next great thing every year.

Recent history has shown that mobile devices are the preferred compute devices of choice. As we move forward this is not going to change.

Innovation.png

So why 64-bit in mobile? For the marketing folks it makes perfect sense as 64 is double 32, so it must be twice as good right? However there are also a number of technical merits supporting 64-bit designs going forward. The main reason is that it’s the architecture and instruction-set-architecture (ISA) that makes the difference. The ISA allows compilers to work smarter and the microarchitecture implementation to be more efficient. Here are a few more benefits to 64-bit that have come off the top of my head:

  • You always have hardware floating point with 64-bit architecture there’s no need to carry around software emulation for floating point operations. More registers to play with means more opportunity for optimisations like loop unrolling and less stack spillage to main memory. Function calls are cheaper in terms of memory usage and can pass twice as many 64-bit points in registers.
  • The ability to handle small bursts of crypto also saves power, as there is no need to keep external crypto accelerators powered up for longer than necessary. It can process incredibly large numbers allowing users to better encrypt data against unauthorized access
  • And finally where the 64-bit part really comes in, is that larger memory devices for complex and large datasets are becoming a reality. While 32-bit CPU’s can only handle 4GB of RAM, 64-bit is virtually limitless (it can handle up to 16 exabytes, or more than 16 billion GB).

In short, there are plenty of reasons for designers to move to 64-bit now than ever before. If you think I’ve missed out on any of the important benefits that 64-bit brings, please mention them in the comments below.

Bandwidth requirements bring design challenges

Bandwidth requirements for premium mobile devices are expected to soar over the next few years and there are several key use cases supporting this trend.

Screen sizes and resolutions have increased across a wide range of devices, and frames per second have increased - not only for consuming content but also for capturing it. As more people capture content via their mobile’s camera, there is greater demand on higher resolution for stills and video capture:

Bandwidths.png

One of the largest users of memory bandwidth in a SoC is the media subsystem components – GPU, video and display.  Nobody wants the annoyance of having his or her screen freeze when capturing that crucial moment on camera, so it is vital that the bandwidth efficiency is optimal here.

Whilst we are making advances in frame-buffer compression technology such as AFBC, peak bandwidth requirements continue to grow.

As our mobile devices become central to our digital lives, those capabilities must be paired with the power efficiency required to work through a full day of heavy use on a single charge. Modern mobile design requires a commitment to getting the most out of every milliwatt and every millimetre of silicon.

PowerEnvelope.png

As engineers and technologists in this market, we have the challenge of delivering this mobile experience within tight energy and thermal constraints.

Thankfully there are some form factors that the market has gravitated around which give us enough stability to allow us to define some SoC power budgets and therefore clearer target to hit.

At ARM we develop our cores, GPUs and system IP with the aim of delivering the maximum performance within an energy or power consumption envelope.

Shifting workloads for premium mobile devices

The shift from traditional mobile phones to smartphones and tablets has resulted in a change in user behaviour. The phone in our pocket is now the primary computing device, and we make increasingly complex demands of it. According to a survey of the amount of time mobile users spend on mobile applications, we see more than 85% of time being spent on three types of applications.

A high percentage of time spent on web-based applications such as web browsing and Facebook, closely followed by Gaming and a good part spent on Audio and Video playback and Utility Apps such as Cloud Storage, Notes, Calendars etc.

WebBrowsing.png

Gaming.pngAudio.png

The graphs above show typical power profiles of examples across a range of mobile devices. Find out more about similar power profiles shown in these graphs in Govind Wathan ’s blog on big.LITTLE MP.

It’s interesting to note that the three most common tasks all consume power in vastly different ways. Clearly we have to bear these different power profiles in mind when designing a SoC that can deliver optimal performance for all use cases.

big.LITTLE™ Technology is ARM’s silicon-proven energy optimization solution. It consists of two or more sets of architecturally identical but different capability CPUs:

BigLITTLE Second Generation CCI-500.png 

The big processors (in BLUE) are designed for high performance and the LITTLE processors (in GREEN) are designed for maximum efficiency. Each CPU cluster has its own L2 cache that has been designed and sized for high performance in the case of the big cluster and high efficiency in the case of the LITTLE.


big.LITTLE supports ARMv7 processors (Cortex-A7, Cortex-A15 and Cortex-A17) as well as ARMv8-A processors (Cortex-A53, Cortex-A57 and the recently announced Cortex-A72). big.LITTLE uses heterogeneous computing to bring you 40-60% additional energy savings, when measured across common mobile use cases on ARMv8 based devices.

Combined with the hardware benefits of moving to the 64-bit architecture on Cortex-A72 and Cortex-A53, the big.LITTLE software model allows multi-processing across all cores.

The first System IP component I get to introduce at this point is our recently announced CoreLink™ CCI-500 Cache Coherent Interconnect that makes big.LITTLE compute possible. Neil Parris wrote an excellent in-depth blog on how CoreLink CCI-500’s snoop filter improves system performance.

CoreLink CCI-500 allows both sets of clusters to see the same block of memory, which enables a flexible, seamless and fast migration of data from the big cluster to the LITTLE cluster and vice versa. It also allows each cluster to snoop into the caches of the other cluster, reducing the time CPUs spend stalling and hence improving performance and saving power. CCI-500 also doubles peak system bandwidth over CCI-400 which is the semiconductor equivalent of upgrading a highway from two lanes to four, easing congestion when traffic gets busy and saving people time.

System IP is central to ARM systems

Given that we have CCI-500 at the core of our system, we can now look at the other System IP components that work in concert with the CCI to help ARM partners build 64-bit systems. When you look at this example representation of a Premium Mobile SoC you can see there is a significant amount of System IP performing multiple tasks.

SystemDiagram.png

CoreLink GIC-500 Generic Interrupt Controller manages migration of interrupts between CPUs and allows for virtualization of interrupts in a hypervisor controlled system.  Compared with the previous generation GIC-400, the GIC-500 supports more than eight CPUs and also supports message-based interrupts as well as directly connecting to ARMv8 Cortex-A72 and Cortex-A53 system register interfaces instead of ARMv7 IRQ and FIQ inputs.

CoreLink MMU-500 System Memory Management Unit supports a common physical memory view for IO devices by sharing the same page tables as the CPUs.

CoreLink TZC-400 TrustZone Address Space Controller and CoreLink DMC-400 Dynamic memory controller are used for efficient DRAM memory access supporting TrustZone memory protection and end-to-end QoS.

Rest of SoC connectivity is serviced by CoreLink NIC-400 which provides a fully configurable interconnect solution to connect sub-systems such as video, display and peripherals. NIC-400 configurability enables partners to build hierarchical, low latency, low power connectivity for AMBA® 4 AXI4™, AMBA 3 AXI3™, AHB™-Lite and APB™ components.

The fact that all of these System IP components are designed, implemented and validated with ARM Cortex processors and the Mali Media library reduces overall system latency. These enhancements play a key role in the performance uplift that 64-bit computing brings to mobile.

Debug & trace for 64-bit

The increased processing throughput in 64-bit system impacts debugging solutions as well, particularly the increase in output bandwidth from the trace macrocell. Debug and trace System IP is also critical for helping ARM partners to debug and optimise software for 64-bit systems comprising:

  DebugSystem.png

  • ETM and PMU for real time trace and software performance analysis
  • System Trace Macrocell for unobtrusive tracing of Systems and Software
  • Trace Memory Controller for directing trace data for self-hosted trace
  • Trace point interface unit for directing trace data off-chip
  • Cross trigger matrix for cross communication of system events whilst debugging
  • And finally, Timestamp for event correlation

CoreSight SoC-400 currently provides the most complete on-chip debug and real-time trace solution for the entire system-on-chip (SoC), making ARM processor-based SoCs the easiest to debug and optimize. Mayank Sharma has explained how to build customised debug and trace solutions for multi-core SoCs using CoreSight SoC-400, showing the value that a well-thought out debug & trace system can offer to all stages of SoC development.

64-bit in 2015

We’ve discussed for 64-bit mobile devices, consumers expect something new every year with better and better performance. What I’ve done in this blog is introduce some of the key IP components that all contribute to the premium devices that are faster and more power-efficient each year. 2015 will be a year where we see the 64-bit mobile device reach a wide audience thanks to the outstanding work of our ARM partners! Building a 64-bit SoC has never been easier owing to all of the IP that has been designed and optimized for the purpose.

As system performance increases, so does the need to tightly control the thermal and energy envelope of the system. Whether it is lowest latency or highest bandwidth demanded by the processors, ARM System IP delivers outstanding efficiency to achieve the performance required with the lowest power and smallest area.

For more information on the System IP portfolio please visit: System IP - ARM

Anonymous
Architectures and Processors blog
  • Introducing GICv5: Scalable and secure interrupt management for Arm

    Christoffer Dall
    Christoffer Dall
    Introducing Arm GICv5: a scalable, hypervisor-free interrupt controller for modern multi-core systems with improved virtualization and real-time support.
    • April 28, 2025
  • Getting started with AARCHMRS Features.json using Python

    Joh
    Joh
    A high-level introduction to the Arm Architecture Machine Readable Specification (AARCHMRS) Features.json with some examples to interpret and start to work with the available data using Python.
    • April 8, 2025
  • Advancing server manageability on Arm Neoverse Compute Subsystem (CSS) with OpenBMC

    Samer El-Haj-Mahmoud
    Samer El-Haj-Mahmoud
    Arm and 9elements Cyber Security have brought a prototype of OpenBMC to the Arm Neoverse Compute Subsystem (CSS) to advancing server manageability.
    • January 28, 2025