Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Mali OpenCL Flag Demo
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • OpenCL
  • Mali
  • gpgpu
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Mali OpenCL Flag Demo

Jonathan Kirkham
Jonathan Kirkham
November 11, 2013
2 minute read time.

This is a demo created internally at ARM by Anthony Barbier.

Mali OpenCL Flag Demo

The demo shows the performance improvements you can achieve when using OpenCL™ on a Mali powered device.


The application is simulating a cloth flag with a ~6000 vertex model. Every frame, for each for these vertices, the application is calculating the affect of the forces of gravity, wind and spring forces between the vertices.

The demo is shown running on the Samsung Exynos 5250 Arndale Board from InSignal which has a dual core ARM® Cortex®-A15 CPU and a quad core ARM Mali™-T604 GPU.

Performance

The version shown first is written in multithreaded C running on the CPU (without using ARM Neon™ technology). This uses 100% of both cores of a dual core Cortex-A15 CPU but only achieves around 4-5 fps. You can see that visually this is not a nice result, the scene is too slow and the movement of the cloth is therefore not smooth. The GPU is being underutilised in this version (less than 1% utilisation). It's a resource of the system which could be put to good use.

Next, the OpenCL version is shown running on a Mali-T604 GPU. In this version, we render two flags (~12000 vertices) at around 36 fps. The flag looks much better now, and the intended simulation effect is much more obvious. The CPU usage in this version has fallen to single digits allowing it to be used for other tasks, for more features, or to sleep to reduce power usage. This shows a 16x performance improvement over the CPU version of the code (2x the number of vertices, 8x the frames per second).

This goes to show that for parallel applications such as this, OpenCL on a Mali device can provide superior performance. Each data point in this application can be calculated independently of all others and therefore, because the Mali GPU is very good at doing parallel processing (up to 256 hardware threads per core), it can easily outperform the CPU which is designed more for good sequential performance (one hardware thread per core).

OpenGL® ES and OpenCL Interoperability

The other interesting thing shown in this demo is efficient OpenGL ES and OpenCL interoperability. In the application OpenCL is used to manipulate the flag model data and then OpenGL ES is used to render it to the screen. Typically, the model data would be manipulated on the host (CPU) side of the application and then uploaded to the GPU for OpenGL ES to render. The host would upload the data into a VBO (Vertex Buffer Object) so the GPU has access to it. In a naïve system, you can imagine that in this demo you would have to (every frame):

  1. manipulate the data using OpenCL
  2. map the memory to a CPU pointer on the host side
  3. upload the data to a VBO for it to be rendered.

Thankfully, this is not the case as this would increase memory usage (increasing power usage) and reduce performance by needlessly copying memory. Instead the two APIs can share the same piece of memory directly.

Hopefully we will have an example of this in one of our Mali SDKs soon.

Anonymous
Mobile, Graphics, and Gaming blog
  • Unlock the power of SVE and SME with SIMD Loops

    Vidya Praveen
    Vidya Praveen
    SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
    • September 19, 2025
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025