Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Inside the Demo: GPU Particle Systems with ASTC 3D textures
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • OpenGL ES 3.1
  • Adaptive Scalable Texture Compression (ASTC)
  • ASTC Evaluation Codec
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Inside the Demo: GPU Particle Systems with ASTC 3D textures

Daniele Di Donato
Daniele Di Donato
May 8, 2014
8 minute read time.

At SIGGRAPH 2014 we presented the benefits of the OpenGL ES 3.0 API and the more newly introduced OpenGL ES 3.1 extension. Adaptive Scalable Texture Compression format (ASTC) is one of the biggest introductions to the OpenGL ES API. The demo I’m going to talk about is a case study of the usage of 3D textures in the mobile space and how ASTC can compress them to provide a huge memory reduction. 3D textures weren’t available in the core OpenGL ES spec up to version 2.0 and the workaround was to use hardware dependent extensions or 2D texture arrays. Now with OpenGL ES 3.x, 3D textures are embedded in the core specification and ready to use…..if only they were not so big! Using uncompressed 3D textures costs a huge amount of memory (for example a 256x256x256 texture with RGBA8888 format uses circa 68MB) which cannot be afforded on a mobile device.

Why did we use ASTC?

The same texture can instead be compressed using different levels of compression with ASTC, giving a saving of ~80% when using the highest quality settings. For those unfamiliar with the ASTC texture compression format, it is a block-based compression algorithm where LxM (or LxMxN in the case of 3D textures) blocks of pixels are compressed together into a single block of 128 bit. The L,M,N values are one of the compression quality factors and represent the number of texels per block dimension. For 3D textures, the dimensions allowed vary from 3 to 6 as reported in the table below:

Block Dimension Bit Rate (bits per texel)
3x3x3 4.74
4x3x3 3.56
4x4x3 2.67
4x4x4 2.00
5x4x4 1.60
5x5x4 1.28
5x5x5 1.02
6x5x5 0.85
6x6x5 0.71
6x6x6 0.59

Since the block compressed size is always 128 bit for all block dimensions, the bit rate is simply 128/#texel_in_a_block. One of the features of ASTC is that it can also compress HDR values (typically 16 bit per channel). Since we needed to store high precision floating-point values in the textures in the demo, we converted the float values (32 bit per channel) to half-float format (16 bit per channel) and used ASTC to compress those textures. In this way the loss of precision is less compared to the usual 32 bit to 8 bit conversion and compression. It is worth noticing that using the HDR formats doesn’t increase the size of the compressed texture because each compressed block will still use 128 bit. Below you can see a 3D texture rendered simply using slicing planes. The compression formats used are: (from left to right) uncompressed, ASTC 3x3x3, ASTC 4x4x4, ASTC 5x5x5.

For those interested in the details of the algorithm, an open source ASTC evaluation encoder/decoder is available at ASTC Evaluation Codec and a video of an internal demo ported to ASTC is available on YouTube. The demo is also available for viewing on the ARM booth #933 at SIGGRAPH this week.

Demo Overview

The main objective of the demo was to use the new OpenGL ES 3.0 API to realize realistic particle systems where motion physics as well as collisions are managed entirely on the GPU. The demo shows two scenes, one which simulates confetti, the other smoke.

     

Transform Feedback for physics simulation

The first feature I want to talk about, which is used for the physics simulation, is Transform Feedback. The physics simulation steps typically output a set of buffers using the previous step results as inputs. These kind of algorithms, called explicit methods in numerical analysis, are well suited to being used with Transform Feedback because it allows the results of vertex shader execution to get back into a buffer that can subsequently be mapped for CPU read or used as the input buffer for other shaders.  In the demo, each particle is mapped to a vertex and the input parameters (position, velocity and lifetime) are stored in an input vertex buffer while the outputs are bound to the transform feedback buffer. Because the whole physics simulation runs on the GPU, we needed a way to give to each particle the knowledge of the objects in the scene (this is now less problematic using Compute Shaders. See below for details). 3D textures helped us in this case because they can represent volumetric information and can be easily sampled in the vertex shader as a classic texture. The 3D textures are generated from the 3D mesh of various objects using a free tool called Voxelizer and the voxel data contain the normal of the surface for voxels on the mesh surface or the direction and the distance to the nearest point on the surface in the case of voxels inside the object. 3D textures can be used to represent various types of data such as a simple mask for occupied or free areas in a scene, density maps or 3D noise. When uploading the files generated from Voxelizer, we convert the floating point values to half-float and then compress the 3D texture using ASTC HDR. In the demo, we use different compression block dimensions to show the differences between uncompressed and compressed textures. Such differences included memory size, memory read bandwidth reduction and energy consumption per frame. The smallest block size (3x3x3) gives us a ~90% reduction and our biggest texture goes down from ~87MB to ~7MB. Below you can find a table of bandwidth measurements for the various types of models we used on a Samsung Galaxy Note 10.1 (2014 Edition).

Sphere Skull Calice Rock Hand
Texture Resolution 128x128x128 180x255x255 255x181x243 78x75x127 43x97x127
Texture Size MB
Uncompressed 16.78 82.62 89.73 5.94 4.24
ASTC 3x3x3 1.27 6.12 6.72 0.45 0.34
ASTC 4x4x4 0.52 2.63 2.87 0.19 0.14
ASTC 5x5x5 0.28 1.32 1.48 0.10 0.07
Memory Read Bandwidth in MB/s
Uncompressed 644.47 752.18 721.96 511.48 299.36
ASTC 3x3x3 342.01 285.78 206.39 374.19 228.05
ASTC 4x4x4 327.63 179.43 175.21 368.13 224.26
ASTC 5x5x5 323.10 167.90 162.89 366.18 222.76
Energy consumption per frame DDR2 mJ per frame
Uncompressed 4.35 5.08 4.87 3.45 2.01
ASTC 3x3x3 2.31 1.93 1.39 2.53 1.54
ASTC 4x4x4 2.21 1.21 1.18 2.48 1.51
ASTC 5x5x5 2.18 1.13 1.10 2.47 1.50
Energy consumption per frame DDR3 mJ per frame
Uncompressed 3.58 4.17 4.01 2.84 1.66
ASTC 3x3x3 1.90 1.59 1.15 2.08 1.27
ASTC 4x4x4 1.82 1.00 0.97 2.04 1.24
ASTC 5x5x5 1.79 0.93 0.90 2.03 1.24

Instancing for efficiency

Another feature that was introduced in OpenGL ES 3.0 is Instancing. It permits us to specify geometry only once and reuse it multiple times in different locations with a single draw call. In the demo we use it for the confetti rendering where, instead of defining a vertex buffer of 2500*4 vertices (we render 2500 particles as quads in the confetti scene), we just define a vertex buffer of 4 vertices and call the:

glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, 2500 );

where GL_TRIANGLE_STRIP specifies the type of primitive to render, 0 is the start index inside the enabled vertex buffers that represents the positions of the vertices of the quad, 4 specifies the number of indices needed to render one instance of the geometry (4 indices per quad) and 2500 is the number of instances to render. Inside the vertex shader, the gl_InstanceID built-in variable will be available and it will contain the identifier for the current invocation. This variable can, for example, be used to access an array of matrices or do specific calculations for each instance. A divisor can also be specified for each active vertex buffer which specifies how the vertex shader will advance in the vertex buffers for each instance.

The smoke scene

In the smoke scene, the smoke is rendered using a noise texture and some math to compute the final colour as if it were a 3D volume. To give the smoke a transparent look we need to combine different overlapping particles’ colours. To do so we use additive blending and disable the z-test when rendering the particles. This gives a nice result even without sorting the particles based on the z-value (otherwise we have to map the buffer in the CPU). Another reason for disabling it is to realize soft particles. The Mali-T6xx series of GPUs can use a specific extension in the fragment shader to read back the values of the framebuffer (colour, depth and stencil) without having to render-to-texture. This feature makes it easier to realize soft particles and in the demo we use a simple approach. First, we render all the solid objects so that their z-value will be written in the depth buffer. After we render the smoke (and thanks to the Mali extension) we can read the depth value of the object and compare it with the current fragment of the particle (to see if it is behind the object) and fade the colour accordingly. This technique eliminates the sharp profile that is formed by the particle quad intersecting the geometry due to the z-test (another reason we had to disable it).

Blurring the smoke

During development the smoke effect looked nice but we wanted it to be more dense and blurry. To achieve all this we decided to render the smoke in an off-screen render buffer with a lower resolution compared to the main screen. This gives us the ability to have a blurred smoke (since the lower resolution removes the higher frequencies) as well as let us increase the number of particles to get a denser look. The current implementation uses a 640x360  off-screen buffer that is up-scaled to 1080p resolution in the final image. A naïve approach causes jaggies on the outline of the object when the smoke is flowing near it due to the blending of the up-sampled low resolution buffer. To almost eliminate this effect, we apply a bilateral filter. The bilateral filter is applied to the off-screen buffer and is given by the product of a Gaussian filter in the colour texture and a linear weighting factor given by the difference in depth. The depth factor is useful on the edge of the model because it gives a higher weight to neighbour texels with depth similar to the one of the current pixel and lower weight when this difference is higher (if we consider a pixel on the edge of a model, some of the neighbour pixels will still be on the model while others will be far in the background).

Bonus track

The recently released OpenGL ES 3.1 spec introduced Compute Shaders as a method for general computing on the GPU (a sort of subset of OpenCL , but in the same context of OpenGL so no context switching needed!!). You can see it in action below:

An introduction to Compute Shaders is also available at:

Get started with compute shaders

References:

I would like to point out some useful websites that helped me understand Indexing and Transform Feedback:

Transform Feedback:

https://www.opengl.org/wiki/Transform_Feedback
http://prideout.net/blog/?tag=opengl-transform-feedback
http://ogldev.atspace.co.uk/www/tutorial28/tutorial28.html
http://open.gl/feedback

Indexing:

http://www.opengl-tutorial.org/intermediate-tutorials/billboards-particles/particles-instancing/
http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html
https://www.opengl.org/wiki/Vertex_Rendering#Instancing

ASTC Evaluation Codec:
https://developer.arm.com/products/software-development-tools/graphics-development-tools/astc-evaluation-codec

Voxelizer:
http://techhouse.brown.edu/~dmorris/voxelizer/

Soft-particles:
http://blog.wolfire.com/2010/04/Soft-Particles

Anonymous
Parents
  • Romain
    Romain over 10 years ago

    Hi,

    Thank you very much for this beautiful demo !

    I have 3 small questions about it, i'd be glad that you help me on one of them : )

    Is this possible to use a 3D texture in a compute shader (OpenGl ES 3.1) ? I.e. access any texel value from a compute shader ?

    How one float texel can represent both direction and distance to the nearest point on the surface ?

    Maybe you need 2 textures, no ?

    Finally, how do you know if the destination of a particule is inside the object by using the 3D texture ?

    I imagine there is some interpolation, but how does it work ?

    I took a look at the links you gave at the end of the article.

    If you have any other resource that could help me build an example of such a use of 3D texture,

    I would be grateful to you,

    Thank you very much,

    Romain

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Comment
  • Romain
    Romain over 10 years ago

    Hi,

    Thank you very much for this beautiful demo !

    I have 3 small questions about it, i'd be glad that you help me on one of them : )

    Is this possible to use a 3D texture in a compute shader (OpenGl ES 3.1) ? I.e. access any texel value from a compute shader ?

    How one float texel can represent both direction and distance to the nearest point on the surface ?

    Maybe you need 2 textures, no ?

    Finally, how do you know if the destination of a particule is inside the object by using the 3D texture ?

    I imagine there is some interpolation, but how does it work ?

    I took a look at the links you gave at the end of the article.

    If you have any other resource that could help me build an example of such a use of 3D texture,

    I would be grateful to you,

    Thank you very much,

    Romain

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Children
No Data
Mobile, Graphics, and Gaming blog
  • Unlock the power of SVE and SME with SIMD Loops

    Vidya Praveen
    Vidya Praveen
    SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
    • September 19, 2025
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025