Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog GPU Compute: Dealing with the Elephant in the Room
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • OpenCL
  • mobile
  • Mali
  • gpu_compute
  • renderscript
  • gpu
  • gpgpu
  • compute
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

GPU Compute: Dealing with the Elephant in the Room

Tim Hartley
Tim Hartley
March 25, 2014
4 minute read time.

In this first blog of a series about GPU Compute we look at the one of the biggest challenges facing the future development and evolution of smart devices.


Isn’t technology wonderful?  It’s incredible to think that the processing power in mobile devices has increased 12 times in the last 4 years.  Screen resolutions have increased by over 13 times in the same period. And as our smart devices are capable of more and more, we’re doing more and more with them.  Study after study shows a continued shift away from desktops and laptops as internet, gaming and entertainment go increasingly mobile. But with all this innovation there’s a problem.  An engineering elephant in the room.  In the same 4 years, whilst everything else has increased by an order of magnitude, battery technology has only increased by a factor of 2.  In engineering terms this presents a massive challenge.  All that processing capacity at our finger tips cruelly grabbed away at the last minute.

tech improvements.png
Processing power information source: McKinsey&Company, “Making smartphones brilliant: ten trends” http://goo.gl/rkSP4

So if we could invent better batteries, we’d be OK, right?  Well, although better batteries would be very welcome, sadly it’s not that simple.  The bigger problem than battery power alone is the one of thermal dissipation.  Not the most glamorous subject maybe – I don’t think anyone wrote a Thermal Dissipation folk song for example – but it’s a critical issue facing mobile engineers today.   Put simply, even if we had the power to run our processors harder they would melt because there’s no way to get rid of all the heat they would generate.  This elephant is not only getting in the way, he’s about to do something unpleasant in the corner.

So to tackle this issue we have to think long and hard about processing efficiency. One way to do this is to add more CPU cores.  Indeed a mixture of faster and more energy efficient cores (big.LITTLE Processing - ARM) allows devices to ramp up and down depending on the demand.  But just adding CPU cores doesn’t scale efficiently – after a while we see diminishing returns.

The key to all this – and a very effective way to tackle processing efficiency – is to think heterogeneously.  The idea of heterogeneous computing is to spread the computing load not only across multiple processors, but across different types of processor.  That involves distributing individual parts of your programme to processors that are best suited to run them.  So, for example, general purpose programme flow would sit with the CPU, whilst a complex image processing algorithm might run on a specialist processor designed to cope efficiently with highly parallel workloads.

One such processor is of course the GPU.  Designed to process millions of vertices and pixels to create user interfaces, games and applications for modern smart devices, the GPU is a master at doing things efficiently in parallel.  Early generations of mobile GPUs were limited to graphics only, but back in November 2012 Google’s Nexus 10 – based on the ARM® Mali™-T604 GPU – became the first mobile device capable of running GPU-accelerated general purpose compute.

nexus.png

Google’s Nexus 10 with Mali-T604 GPU

Since then the true benefit of designing applications to run heterogeneously has been demonstrated time after time.  Not only can mobile GPUs speed up certain activities – like image processing, computer vision, video decoding etc. – they can usually do it significantly more efficiently.  And using less power to achieve the same thing is all part of tackling that elephant.

But creating applications that make good use of compute on GPUs can be daunting for software engineers used to traditional programming techniques.  It not only requires a new way of thinking, but new tools and APIs as well.  And understanding the capabilities of the processors at your disposal is a key step to getting the best out of a platform.  In this series of blogs we’ll be going into plenty of detail on this brave new elephant-banishing world.  We’ll be covering the Mali-T600 and T700 GPU architectures in detail, explaining how they do what they do so you can get the best out of them.  We’ll be looking at optimization techniques, software tools and languages that can help you along the way.  This will include blogs on Google’s RenderScript, OpenCL™, ARM NEON™ technology, and much more.

So stay tuned for more on the world of compute on Mali GPUs and let us know in the comments any particular areas of interest you would like to us to cover.

If you have a Samsung Chromebook you can try OpenCL on Mali for yourself.  Check out this guide on the Malideveloper website: http://malideveloper.arm.com/develop-for-mali/features/graphics-and-compute-development-on-samsung-chromebook/


And if you are interested in RenderScript on the Nexus10, here’s a good place to start: http://developer.android.com/guide/topics/renderscript/compute.html

Anonymous
Mobile, Graphics, and Gaming blog
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025
  • Start experimenting with Neural Super Sampling for mobile graphics today

    Sergio Alapont Granero
    Sergio Alapont Granero
    Laying the foundation for neural upscaling to enable sharper, smoother, AI-powered gaming on next-generation Arm GPUs.
    • August 12, 2025