Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Optimizing GPU Compute Kernels.
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • OpenCL
  • Mali
  • optimization
  • gpu_compute
  • renderscript
  • compute
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Optimizing GPU Compute Kernels.

Johan Gronqvist
Johan Gronqvist
November 6, 2013
1 minute read time.

If you have some previous experience with GPU compute, or if you have watched the GPU Compute for Mobile Devices at ARM Techcon Developer Summit presentation, and you have a Compute application that you want to optimize, it may be hard to know where to start. You have been given some advice, but it can be hard to know what kinds of optimizations are relevant for your particular kernels.

At the ARM Techcon Developer Summit, I talked about that problem, trying to give an intuition about how threads whirl around inside the cores while executing your kernels. As always, a prerequisite to successful optimization is obtaining some understanding of where the bottlenecks might be. For Mali, the first part of this presentation aims at giving that understanding. Armed with an understanding of how execution happens, the hardware counters in the GPU give the necessary capability of looking inside the cores to see what is actually going on while your program is running. Streamline gives a nice time-line view of many kinds of counters, and the second part of this presentation introduces them and their use for optimizing Compute kernels.

If you have any questions, this website is the place to ask.

Have fun!

Optimizing Compute Kernels for Mobile GPUs.pdf
Anonymous
  • Johan Gronqvist
    Johan Gronqvist over 11 years ago

    The recording of the presentation is now available Optimizing Compute Kernels for Mobile GPUs - YouTube

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Mobile, Graphics, and Gaming blog
  • Unlock the power of SVE and SME with SIMD Loops

    Vidya Praveen
    Vidya Praveen
    SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
    • September 19, 2025
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025