At SIGGRAPH 2014 we presented the benefits of the OpenGL ES 3.0 API and the more newly introduced OpenGL ES 3.1 extension. Adaptive Scalable Texture Compression format (ASTC) is one of the biggest introductions to the OpenGL ES API. The demo I’m going to talk about is a case study of the usage of 3D textures in the mobile space and how ASTC can compress them to provide a huge memory reduction. 3D textures weren’t available in the core OpenGL ES spec up to version 2.0 and the workaround was to use hardware dependent extensions or 2D texture arrays. Now with OpenGL ES 3.x, 3D textures are embedded in the core specification and ready to use…..if only they were not so big! Using uncompressed 3D textures costs a huge amount of memory (for example a 256x256x256 texture with RGBA8888 format uses circa 68MB) which cannot be afforded on a mobile device.
The same texture can instead be compressed using different levels of compression with ASTC, giving a saving of ~80% when using the highest quality settings. For those unfamiliar with the ASTC texture compression format, it is a block-based compression algorithm where LxM (or LxMxN in the case of 3D textures) blocks of pixels are compressed together into a single block of 128 bit. The L,M,N values are one of the compression quality factors and represent the number of texels per block dimension. For 3D textures, the dimensions allowed vary from 3 to 6 as reported in the table below:
Since the block compressed size is always 128 bit for all block dimensions, the bit rate is simply 128/#texel_in_a_block. One of the features of ASTC is that it can also compress HDR values (typically 16 bit per channel). Since we needed to store high precision floating-point values in the textures in the demo, we converted the float values (32 bit per channel) to half-float format (16 bit per channel) and used ASTC to compress those textures. In this way the loss of precision is less compared to the usual 32 bit to 8 bit conversion and compression. It is worth noticing that using the HDR formats doesn’t increase the size of the compressed texture because each compressed block will still use 128 bit. Below you can see a 3D texture rendered simply using slicing planes. The compression formats used are: (from left to right) uncompressed, ASTC 3x3x3, ASTC 4x4x4, ASTC 5x5x5.
For those interested in the details of the algorithm, an open source ASTC evaluation encoder/decoder is available at ASTC Evaluation Codec and a video of an internal demo ported to ASTC is available on YouTube. The demo is also available for viewing on the ARM booth #933 at SIGGRAPH this week.
The main objective of the demo was to use the new OpenGL ES 3.0 API to realize realistic particle systems where motion physics as well as collisions are managed entirely on the GPU. The demo shows two scenes, one which simulates confetti, the other smoke.
The first feature I want to talk about, which is used for the physics simulation, is Transform Feedback. The physics simulation steps typically output a set of buffers using the previous step results as inputs. These kind of algorithms, called explicit methods in numerical analysis, are well suited to being used with Transform Feedback because it allows the results of vertex shader execution to get back into a buffer that can subsequently be mapped for CPU read or used as the input buffer for other shaders. In the demo, each particle is mapped to a vertex and the input parameters (position, velocity and lifetime) are stored in an input vertex buffer while the outputs are bound to the transform feedback buffer. Because the whole physics simulation runs on the GPU, we needed a way to give to each particle the knowledge of the objects in the scene (this is now less problematic using Compute Shaders. See below for details). 3D textures helped us in this case because they can represent volumetric information and can be easily sampled in the vertex shader as a classic texture. The 3D textures are generated from the 3D mesh of various objects using a free tool called Voxelizer and the voxel data contain the normal of the surface for voxels on the mesh surface or the direction and the distance to the nearest point on the surface in the case of voxels inside the object. 3D textures can be used to represent various types of data such as a simple mask for occupied or free areas in a scene, density maps or 3D noise. When uploading the files generated from Voxelizer, we convert the floating point values to half-float and then compress the 3D texture using ASTC HDR. In the demo, we use different compression block dimensions to show the differences between uncompressed and compressed textures. Such differences included memory size, memory read bandwidth reduction and energy consumption per frame. The smallest block size (3x3x3) gives us a ~90% reduction and our biggest texture goes down from ~87MB to ~7MB. Below you can find a table of bandwidth measurements for the various types of models we used on a Samsung Galaxy Note 10.1 (2014 Edition).
Another feature that was introduced in OpenGL ES 3.0 is Instancing. It permits us to specify geometry only once and reuse it multiple times in different locations with a single draw call. In the demo we use it for the confetti rendering where, instead of defining a vertex buffer of 2500*4 vertices (we render 2500 particles as quads in the confetti scene), we just define a vertex buffer of 4 vertices and call the:
where GL_TRIANGLE_STRIP specifies the type of primitive to render, 0 is the start index inside the enabled vertex buffers that represents the positions of the vertices of the quad, 4 specifies the number of indices needed to render one instance of the geometry (4 indices per quad) and 2500 is the number of instances to render. Inside the vertex shader, the gl_InstanceID built-in variable will be available and it will contain the identifier for the current invocation. This variable can, for example, be used to access an array of matrices or do specific calculations for each instance. A divisor can also be specified for each active vertex buffer which specifies how the vertex shader will advance in the vertex buffers for each instance.
In the smoke scene, the smoke is rendered using a noise texture and some math to compute the final colour as if it were a 3D volume. To give the smoke a transparent look we need to combine different overlapping particles’ colours. To do so we use additive blending and disable the z-test when rendering the particles. This gives a nice result even without sorting the particles based on the z-value (otherwise we have to map the buffer in the CPU). Another reason for disabling it is to realize soft particles. The Mali-T6xx series of GPUs can use a specific extension in the fragment shader to read back the values of the framebuffer (colour, depth and stencil) without having to render-to-texture. This feature makes it easier to realize soft particles and in the demo we use a simple approach. First, we render all the solid objects so that their z-value will be written in the depth buffer. After we render the smoke (and thanks to the Mali extension) we can read the depth value of the object and compare it with the current fragment of the particle (to see if it is behind the object) and fade the colour accordingly. This technique eliminates the sharp profile that is formed by the particle quad intersecting the geometry due to the z-test (another reason we had to disable it).
During development the smoke effect looked nice but we wanted it to be more dense and blurry. To achieve all this we decided to render the smoke in an off-screen render buffer with a lower resolution compared to the main screen. This gives us the ability to have a blurred smoke (since the lower resolution removes the higher frequencies) as well as let us increase the number of particles to get a denser look. The current implementation uses a 640x360 off-screen buffer that is up-scaled to 1080p resolution in the final image. A naïve approach causes jaggies on the outline of the object when the smoke is flowing near it due to the blending of the up-sampled low resolution buffer. To almost eliminate this effect, we apply a bilateral filter. The bilateral filter is applied to the off-screen buffer and is given by the product of a Gaussian filter in the colour texture and a linear weighting factor given by the difference in depth. The depth factor is useful on the edge of the model because it gives a higher weight to neighbour texels with depth similar to the one of the current pixel and lower weight when this difference is higher (if we consider a pixel on the edge of a model, some of the neighbour pixels will still be on the model while others will be far in the background).
The recently released OpenGL ES 3.1 spec introduced Compute Shaders as a method for general computing on the GPU (a sort of subset of OpenCL , but in the same context of OpenGL so no context switching needed!!). You can see it in action below:
An introduction to Compute Shaders is also available at:
Get started with compute shaders
I would like to point out some useful websites that helped me understand Indexing and Transform Feedback:
Transform Feedback:
https://www.opengl.org/wiki/Transform_Feedback http://prideout.net/blog/?tag=opengl-transform-feedback http://ogldev.atspace.co.uk/www/tutorial28/tutorial28.html http://open.gl/feedback
Indexing:
http://www.opengl-tutorial.org/intermediate-tutorials/billboards-particles/particles-instancing/ http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html https://www.opengl.org/wiki/Vertex_Rendering#Instancing
ASTC Evaluation Codec: https://developer.arm.com/products/software-development-tools/graphics-development-tools/astc-evaluation-codec
Voxelizer: http://techhouse.brown.edu/~dmorris/voxelizer/
Soft-particles: http://blog.wolfire.com/2010/04/Soft-Particles
Hi Romain,
Happy to hear that!
Yes, storing the gradient in the other floats will remove the computation in the compute shader. About the interpolation, bear in mind that the GPU can do a trilinear interpolation if you use the usual texture2D sampling functions. If you use imageLoad functions it will not do it since it uses integer coordinates (sort of executing the texelFetch function).
DDD