ARM Mali-T628 or how performance sometimes outstrips promises

October 15, 2013

6 minute read time.

In the fast-paced technology world we are used to hearing about improvements from one product generation to the next. In the past, we have looked in great detail at the various different metrics used to compare GPU performance for graphics and compute use cases. This time we want to celebrate the recent release of the Samsung Galaxy Note 3 and the Samsung Galaxy Note 10.1 (2014) based on the Samsung Exynos 5 Octa (5420) platform with ARM® Mali™-T628 and the fact that “performance takes a huge step forward” when compared to previous devices on the market. But we also want to take this opportunity to highlight energy-efficiency optimisations and double check if we have delivered on our promise of a 50% improvement. So in this blog we will look more closely into the best practices for benchmarking the performance in battery and area constrained devices and when comparing improvements in the energy efficiency.

Let’s start with the three golden rules of benchmarking GPUs for energy efficiency.

Screen resolution

Let’s imagine two devices with two different form factors. One of them has a 720p screen while the other is equipped with an ultra high definition 2.5K screen. In each frame the latter device has to process four times more pixels then the former one (see graph below). This means that in order to deliver the same performance it has to provide four times more throughput and potentially consume a proportionally higher amount of energy. That explains why most of the industry standard benchmarks tend to use the off-screen buffers with fixed resolution (for instance 1080p in the case of GLBenchmark) and are therefore able to provide an apples-to-apples comparison for devices of different form factors.

Performance

To render a single frame of given content, a GPU has to process a specific amount of data and consume a given amount of energy. In the typical use case it will be required to deliver 60 frames per second for any content visible on the screen. Even if the GPU is capable of running faster than 60 fps the frame rate will be capped by the screen refresh rate of 60Hz. As we pointed out earlier, typical graphics benchmarks will often use off-screen buffers to compare performance at the same screen resolution - this enables tests to be running at a frame rate beyond 60fps and allows devices to be compared at their top-end performance.

Use Case

Modern mobile devices enable different use cases with diverse graphics requirements, particularly when it comes to the complexity of the content being processed. Obviously a GPU has to do much less to process a single frame of user interface or a casual game then it would when running a high-end game or a graphics benchmark designed to stress test the graphics system. Industry standard 3D graphics benchmarks provide a good indication of what we could expect from the AAA class content. However, we also have to look into the test cases that match more causal use cases i.e. playing Fruit Ninja or scrolling through the Android™ UI. In the past we covered why metrics such as triangles per second or pixels per second don’t necessarily map into the real-life balanced use case, and why it is always important to actually run the application that is representative for the use cases we want to characterise.

Energy efficiency

Taking into account the above considerations on performance, screen resolution and use case it’s much easier to realise why average power on its own is not a useful metric when comparing how a given GPU performs within a given power budget. As it stands, this metric ignores performance and resolution and without any further context it says nothing about the efficiency of the GPU.

If we had to construct a metric that takes those factors into account we would have to provide an Average Power for a given Performance at a given Screen Resolution running a given Use case. Ignoring the excessive use of upper case in the above sentence, clearly we would need to simplify it a little bit - and we have two choices:

FPS per Watt - in other words average performance achieved within a given average power budget when running a given use case at a given screen resolution
- An example would be a GPU that delivers 50 Frames per second within 1 watt of average power when running GLBenchmark 2.7 T-Rex 1080p Off-screen
mJ per Frame – so in other words energy used per frame for a given use case at a given screen resolution
- An example would be a GPU that uses on average 20 mJ per frame when running GLBenchmark 2.7 T-Rex 1080p Off-screen

Leaving behind the academic debate on the advantages of joules per task or even joules per pixel over performance per watt, in both of the case above we are talking about exactly the same GPU and we have just chosen to present it in two different ways.

Second generation of Mali-T600 series delivers 50% energy efficiency improvement… and then some

Up to now this article was far from the promised celebration of the market leading Samsung devices with Mali-T628 and it’s time to change that… So how did we do with the second generation of the Mali-T600 series and how do we compare to other devices on the market? For that we will compare the performance and energy efficiency between a 2012 tablet with the Mali-T604 and a 2013 one with the Mali-T628 MP6.

As you can see we not only delivered the 50% energy efficiency improvement, but in reality we achieved more than a 100%. Obviously the semiconductor fabrication used to create a System on a Chip is constantly improving and new process nodes introduce higher clock speeds and lower power consumption. So we have to take into account manufacturing processes improvements as well as benefits of using ARM Artisan™ Physical IP that enables efficient implementation of such complex SoC designs. But even with that we can safely assume that we have delivered our promised improvement in the Mali-T620 series.

Also it is worth bearing in mind that GPU energy efficiency is only half of the story, ARM big.LITTLE™ multi-processing technology and the ARM Cortex™-A series of processors deliver high performance and efficiency across the entire system. Additionally, memory bandwidth also contributes to the entire system power. Mali GPUs with Job Manager, Hierarchical Tiler, Transaction Elimination, Adaptive Scalable Texture Compression and other upcoming features enable additional multiple end-to-end savings that result in ARM-based systems with longer battery life and lower thermal budget.

So to summarise, Mali-T628 GPU has delivered even more than we promised, and in the next few blogs we are going to explain how ARM technology leadership and Mali GPUs further reduce system power.

Lori Kate Smith over 10 years ago

I shouldn't be surprised by now, but with each generation, it is still amazing to me to see the graphics improvements. When I see the same video/ game/ app play on a device a generation before (which in devices is often less than a year), the improvements make my eyes dance.
And exciting new products keep coming out on the mali-t604 devices. Another Samsung Electronics Samsung Exynos Series by Samsung Electronics, the HP Chromebook was reviewed by AnandTech | HP Chromebook 11 Review
So I continue to be delighted by the innovative products that are based on Mali.
Thanks for the blog Jakub.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Graphics, Gaming, and VR blog

Coming soon in Arm Frame Advisor

Julie Gaskin

Read about our vision for future feature enhancements in Frame Advisor. We have listened to your feedback and plan to extend the kinds of analyses you can perform. Help us to create more great features…
- March 13, 2024
Using the new custom reporting features in Performance Advisor

Connor Brookes

Explaining the new custom reporting features in Performance Advisor and how to use them.
- March 4, 2024
Beyond Mobile: Arm Mobile Studio is now Arm Performance Studio

Julie Gaskin

We are proud to announce that the latest version of our profiling tool suite for mobile is now available to download and use for free. In this release, we have a few changes to tell you about.
- February 26, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

ARM Mali-T628 or how performance sometimes outstrips promises

Screen resolution

Performance

Use Case

Energy efficiency

Coming soon in Arm Frame Advisor

Using the new custom reporting features in Performance Advisor

Beyond Mobile: Arm Mobile Studio is now Arm Performance Studio