First announced in November 2011, the ARMv8-A architecture and Cortex-A57 and Cortex-A53 processors have cumulatively amassed over 50 licensing agreements. This momentum is particularly strong in manufacturers of application processors targeted at smartphones, with all top-10 players having adopted ARMv8-A. This adoption is set to continue throughout 2015 as premium mobile makers seek to harness the increased potential that the upgrade in system architecture offers. What that means for consumers is devices that are fluid and responsive when handling all of the complex tasks demanded of modern smartphones and tablets.
With this blog I will go through some aspects of the system that make a significant contribution to the increase in performance associated with the shift from 32 to 64-bit.
I don’t think I am alone in thinking that the main drivers in the premium mobile device market are human experiences and expectations. The demand for better user experiences on higher resolution displays whilst retaining fluid responses, with more device-to-device connectivity means that consumers are looking for the next great thing every year.
Recent history has shown that mobile devices are the preferred compute devices of choice. As we move forward this is not going to change.
So why 64-bit in mobile? For the marketing folks it makes perfect sense as 64 is double 32, so it must be twice as good right? However there are also a number of technical merits supporting 64-bit designs going forward. The main reason is that it’s the architecture and instruction-set-architecture (ISA) that makes the difference. The ISA allows compilers to work smarter and the microarchitecture implementation to be more efficient. Here are a few more benefits to 64-bit that have come off the top of my head:
In short, there are plenty of reasons for designers to move to 64-bit now than ever before. If you think I’ve missed out on any of the important benefits that 64-bit brings, please mention them in the comments below.
Bandwidth requirements for premium mobile devices are expected to soar over the next few years and there are several key use cases supporting this trend.
Screen sizes and resolutions have increased across a wide range of devices, and frames per second have increased - not only for consuming content but also for capturing it. As more people capture content via their mobile’s camera, there is greater demand on higher resolution for stills and video capture:
One of the largest users of memory bandwidth in a SoC is the media subsystem components – GPU, video and display. Nobody wants the annoyance of having his or her screen freeze when capturing that crucial moment on camera, so it is vital that the bandwidth efficiency is optimal here.
Whilst we are making advances in frame-buffer compression technology such as AFBC, peak bandwidth requirements continue to grow.
As our mobile devices become central to our digital lives, those capabilities must be paired with the power efficiency required to work through a full day of heavy use on a single charge. Modern mobile design requires a commitment to getting the most out of every milliwatt and every millimetre of silicon.
As engineers and technologists in this market, we have the challenge of delivering this mobile experience within tight energy and thermal constraints.
Thankfully there are some form factors that the market has gravitated around which give us enough stability to allow us to define some SoC power budgets and therefore clearer target to hit.
At ARM we develop our cores, GPUs and system IP with the aim of delivering the maximum performance within an energy or power consumption envelope.
The shift from traditional mobile phones to smartphones and tablets has resulted in a change in user behaviour. The phone in our pocket is now the primary computing device, and we make increasingly complex demands of it. According to a survey of the amount of time mobile users spend on mobile applications, we see more than 85% of time being spent on three types of applications.
A high percentage of time spent on web-based applications such as web browsing and Facebook, closely followed by Gaming and a good part spent on Audio and Video playback and Utility Apps such as Cloud Storage, Notes, Calendars etc.
The graphs above show typical power profiles of examples across a range of mobile devices. Find out more about similar power profiles shown in these graphs in Govind Wathan ’s blog on big.LITTLE MP.
It’s interesting to note that the three most common tasks all consume power in vastly different ways. Clearly we have to bear these different power profiles in mind when designing a SoC that can deliver optimal performance for all use cases.
big.LITTLE™ Technology is ARM’s silicon-proven energy optimization solution. It consists of two or more sets of architecturally identical but different capability CPUs:
The big processors (in BLUE) are designed for high performance and the LITTLE processors (in GREEN) are designed for maximum efficiency. Each CPU cluster has its own L2 cache that has been designed and sized for high performance in the case of the big cluster and high efficiency in the case of the LITTLE.
big.LITTLE supports ARMv7 processors (Cortex-A7, Cortex-A15 and Cortex-A17) as well as ARMv8-A processors (Cortex-A53, Cortex-A57 and the recently announced Cortex-A72). big.LITTLE uses heterogeneous computing to bring you 40-60% additional energy savings, when measured across common mobile use cases on ARMv8 based devices.
Combined with the hardware benefits of moving to the 64-bit architecture on Cortex-A72 and Cortex-A53, the big.LITTLE software model allows multi-processing across all cores.
The first System IP component I get to introduce at this point is our recently announced CoreLink™ CCI-500 Cache Coherent Interconnect that makes big.LITTLE compute possible. Neil Parris wrote an excellent in-depth blog on how CoreLink CCI-500’s snoop filter improves system performance.
CoreLink CCI-500 allows both sets of clusters to see the same block of memory, which enables a flexible, seamless and fast migration of data from the big cluster to the LITTLE cluster and vice versa. It also allows each cluster to snoop into the caches of the other cluster, reducing the time CPUs spend stalling and hence improving performance and saving power. CCI-500 also doubles peak system bandwidth over CCI-400 which is the semiconductor equivalent of upgrading a highway from two lanes to four, easing congestion when traffic gets busy and saving people time.
Given that we have CCI-500 at the core of our system, we can now look at the other System IP components that work in concert with the CCI to help ARM partners build 64-bit systems. When you look at this example representation of a Premium Mobile SoC you can see there is a significant amount of System IP performing multiple tasks.
CoreLink GIC-500 Generic Interrupt Controller manages migration of interrupts between CPUs and allows for virtualization of interrupts in a hypervisor controlled system. Compared with the previous generation GIC-400, the GIC-500 supports more than eight CPUs and also supports message-based interrupts as well as directly connecting to ARMv8 Cortex-A72 and Cortex-A53 system register interfaces instead of ARMv7 IRQ and FIQ inputs.
CoreLink MMU-500 System Memory Management Unit supports a common physical memory view for IO devices by sharing the same page tables as the CPUs.
CoreLink TZC-400 TrustZone Address Space Controller and CoreLink DMC-400 Dynamic memory controller are used for efficient DRAM memory access supporting TrustZone memory protection and end-to-end QoS.
Rest of SoC connectivity is serviced by CoreLink NIC-400 which provides a fully configurable interconnect solution to connect sub-systems such as video, display and peripherals. NIC-400 configurability enables partners to build hierarchical, low latency, low power connectivity for AMBA® 4 AXI4™, AMBA 3 AXI3™, AHB™-Lite and APB™ components.
The fact that all of these System IP components are designed, implemented and validated with ARM Cortex processors and the Mali Media library reduces overall system latency. These enhancements play a key role in the performance uplift that 64-bit computing brings to mobile.
The increased processing throughput in 64-bit system impacts debugging solutions as well, particularly the increase in output bandwidth from the trace macrocell. Debug and trace System IP is also critical for helping ARM partners to debug and optimise software for 64-bit systems comprising:
CoreSight SoC-400 currently provides the most complete on-chip debug and real-time trace solution for the entire system-on-chip (SoC), making ARM processor-based SoCs the easiest to debug and optimize. Mayank Sharma has explained how to build customised debug and trace solutions for multi-core SoCs using CoreSight SoC-400, showing the value that a well-thought out debug & trace system can offer to all stages of SoC development.
We’ve discussed for 64-bit mobile devices, consumers expect something new every year with better and better performance. What I’ve done in this blog is introduce some of the key IP components that all contribute to the premium devices that are faster and more power-efficient each year. 2015 will be a year where we see the 64-bit mobile device reach a wide audience thanks to the outstanding work of our ARM partners! Building a 64-bit SoC has never been easier owing to all of the IP that has been designed and optimized for the purpose.
As system performance increases, so does the need to tightly control the thermal and energy envelope of the system. Whether it is lowest latency or highest bandwidth demanded by the processors, ARM System IP delivers outstanding efficiency to achieve the performance required with the lowest power and smallest area.
For more information on the System IP portfolio please visit: System IP - ARM