GPU Compute: Dealing with the Elephant in the Room

March 25, 2014

4 minute read time.

In this first blog of a series about GPU Compute we look at the one of the biggest challenges facing the future development and evolution of smart devices.

Isn’t technology wonderful? It’s incredible to think that the processing power in mobile devices has increased 12 times in the last 4 years. Screen resolutions have increased by over 13 times in the same period. And as our smart devices are capable of more and more, we’re doing more and more with them. Study after study shows a continued shift away from desktops and laptops as internet, gaming and entertainment go increasingly mobile. But with all this innovation there’s a problem. An engineering elephant in the room. In the same 4 years, whilst everything else has increased by an order of magnitude, battery technology has only increased by a factor of 2. In engineering terms this presents a massive challenge. All that processing capacity at our finger tips cruelly grabbed away at the last minute.

Processing power information source: McKinsey&Company, “Making smartphones brilliant: ten trends” http://goo.gl/rkSP4

So if we could invent better batteries, we’d be OK, right? Well, although better batteries would be very welcome, sadly it’s not that simple. The bigger problem than battery power alone is the one of thermal dissipation. Not the most glamorous subject maybe – I don’t think anyone wrote a Thermal Dissipation folk song for example – but it’s a critical issue facing mobile engineers today. Put simply, even if we had the power to run our processors harder they would melt because there’s no way to get rid of all the heat they would generate. This elephant is not only getting in the way, he’s about to do something unpleasant in the corner.

So to tackle this issue we have to think long and hard about processing efficiency. One way to do this is to add more CPU cores. Indeed a mixture of faster and more energy efficient cores (big.LITTLE Processing - ARM) allows devices to ramp up and down depending on the demand. But just adding CPU cores doesn’t scale efficiently – after a while we see diminishing returns.

The key to all this – and a very effective way to tackle processing efficiency – is to think heterogeneously. The idea of heterogeneous computing is to spread the computing load not only across multiple processors, but across different types of processor. That involves distributing individual parts of your programme to processors that are best suited to run them. So, for example, general purpose programme flow would sit with the CPU, whilst a complex image processing algorithm might run on a specialist processor designed to cope efficiently with highly parallel workloads.

One such processor is of course the GPU. Designed to process millions of vertices and pixels to create user interfaces, games and applications for modern smart devices, the GPU is a master at doing things efficiently in parallel. Early generations of mobile GPUs were limited to graphics only, but back in November 2012 Google’s Nexus 10 – based on the ARM® Mali™-T604 GPU – became the first mobile device capable of running GPU-accelerated general purpose compute.

Google’s Nexus 10 with Mali-T604 GPU

Since then the true benefit of designing applications to run heterogeneously has been demonstrated time after time. Not only can mobile GPUs speed up certain activities – like image processing, computer vision, video decoding etc. – they can usually do it significantly more efficiently. And using less power to achieve the same thing is all part of tackling that elephant.

But creating applications that make good use of compute on GPUs can be daunting for software engineers used to traditional programming techniques. It not only requires a new way of thinking, but new tools and APIs as well. And understanding the capabilities of the processors at your disposal is a key step to getting the best out of a platform. In this series of blogs we’ll be going into plenty of detail on this brave new elephant-banishing world. We’ll be covering the Mali-T600 and T700 GPU architectures in detail, explaining how they do what they do so you can get the best out of them. We’ll be looking at optimization techniques, software tools and languages that can help you along the way. This will include blogs on Google’s RenderScript, OpenCL™, ARM NEON™ technology, and much more.

So stay tuned for more on the world of compute on Mali GPUs and let us know in the comments any particular areas of interest you would like to us to cover.

If you have a Samsung Chromebook you can try OpenCL on Mali for yourself. Check out this guide on the Malideveloper website: http://malideveloper.arm.com/develop-for-mali/features/graphics-and-compute-development-on-samsung-chromebook/

And if you are interested in RenderScript on the Nexus10, here’s a good place to start: http://developer.android.com/guide/topics/renderscript/compute.html

Mobile, Graphics, and Gaming blog

Unlock the power of SVE and SME with SIMD Loops

Vidya Praveen

SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
- September 19, 2025
What is Arm Performance Studio?

Jai Schrem

Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
- August 27, 2025
How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal super sampling.
- August 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

GPU Compute: Dealing with the Elephant in the Room

Unlock the power of SVE and SME with SIMD Loops

What is Arm Performance Studio?

How Neural Super Sampling works: Architecture, training, and inference