Time sure flies when you’re having fun! It’s been more than two years since SIGGRAPH Asia 2011 in Hong Kong, where I had the pleasure of unveiling our Adaptive Scalable Texture Compression (ASTC) technology. A lot has happened on the ASTC front since then: we made many technical improvements, providing even better quality and finer control of bit rate; we published full technical details of the format at High Performance Graphics 2012; the Khronos Group ratified an ASTC extension; and the first consumer devices with ASTC support began to appear on the market. It fairly boggles the mind. Well, it does mine, anyway.
Why am I wandering down memory lane like this? Two reasons: One is that just before this past Christmas, I went back to Hong Kong for SIGGRAPH Asia 2013, this time to talk more generally about power and memory bandwidth reduction in the latest ARM® Mali™ GPUs. The other reason, and the main one, is that ASTC recently passed another milestone in its march toward ubiquity, as the Khronos Group ratified extensions covering the full functionality of the format.
You thought Khronos had already standardized ASTC? It had, but not completely. Khronos released an extension called KHR_texture_compression_astc_ldr at SIGGRAPH 2012. However, that extension exposed only the low dynamic range (LDR, get it?) pixel formats of ASTC, and only for 2D images. We did that because at the time, details of the high dynamic range (HDR) and 3D features hadn’t been nailed down, and some Khronos members weren’t sure they would work as well as we hoped – they were, after all, pretty revolutionary. Also, some Khronos members wanted to start implementing the 2D LDR features of ASTC right away, before we were ready to freeze the definition of the more advanced features.
I’m happy to say that, in the end, the HDR and 3D features of ASTC turned out to work very well indeed. Recognizing this, the Khronos Group recently ratified two new extensions, adding HDR and 3D functionality respectively. The ASTC family now looks like this:
KHR_texture_compression_astc_ldr is the previously-ratified low dynamic range profile
KHR_texture_compression_astc_hdr extends the LDR profile to include HDR
OES_texture_compression_astc extends the HDR profile to include 3D textures
The extensions are layered, with each new layer requiring the previous layers, so if your implementation supports KHR_texture_compression_astc_hdr, all of the LDR features are supported too. If it supports OES_texture_compression_astc, it supports everything. If you try to use an HDR texture on an implementation that doesn’t support HDR, the LDR portions of the texture decode normally, and the HDR texels come back a lovely shade of radioactive pink.
You might be wondering about the extension name prefixes: why KHR-blah-ldr and KHR-blah-hdr, but OES-blah-astc? The OES prefix identifies an extension that is defined and ratified by the OpenGL® ES working group, for use with OpenGL ES. Extensions with the KHR prefix are ratified by both the OpenGL ES and desktop OpenGL® working groups, and can be used with either API. So, you can and will see ASTC LDR- and HDR-capable GPUs on desktop as well as mobile devices, but for the moment there’s no way to ship ASTC 3D textures on the desktop. It’s too bad, but hey, OpenGL ES is shipping in a billion devices a year; and the desktop will catch up eventually.
So, ASTC HDR and 3D are now available as Khronos standards. What does that mean? How does it make life better for mobile device manufacturers, or app developers, or users?
We’ve written at length about the technology – how ASTC offers developers unprecedented flexibility in bit rate and pixel format, as well as a substantial boost in image quality. And Sean has a great article describing how the HDR and 3D features of ASTC work, and why they’re useful – even, potentially, revolutionary. If you aren’t convinced by now, you aren’t going to be, so I won’t repeat that story here.
What Khronos standardization adds to the picture is that it puts ASTC on the road to becoming universally available. By placing the format under the Khronos IP umbrella, it removes the uncertainties that have prevented widespread adoption of proprietary formats like S3TC and PVRTC. It is also, obviously, a powerful endorsement of the technology. Add in the enthusiastic reception the format has received from developers, and the bottom line is that GPU vendors now have many reasons to support it in their hardware, and few reasons not to. ASTC has been available for some time now in the Exynos-based versions of the Samsung Galaxy Note 3, Note Pro and other devices, which feature the ARM Mali-T628 MP6 GPU. We understand that it’ll be supported in upcoming SoCs and IP cores from Qualcomm, NVIDIA, and Imagination Technologies as well. Other implementations are on the way.
I said I wasn’t going to talk about ASTC from a technical point of view, but I can’t resist – after all, you can’t write a blog about texture compression without showing an image, can you? So here’s an image. Actually, here are two:
Figure 1: A (chocolate-free) teapot rendered using a 2MB volume texture
Figure 2: The same teapot with the volume texture compressed to 151KB using ASTC.
What you’re seeing is an implementation of a procedural marble shader, taken from the AMD RenderMonkey™ examples. What’s interesting about it is that it’s not a 2D marble texture uv-mapped onto the surface of the teapot. Instead, the shader samples a 3D noise function at every point on the surface, and uses the result to sample a 1D color gradient texture. The 1D texture is tiny, but the noise function is implemented as a 128x128x128 volume texture. The original 8-bit, single channel texture (used to produce the upper image) occupies 2 MB – not huge, but big enough to make you ask if you really need it, at least on a mobile device. The version in the second image uses the same volume texture, compressed using ASTC at 0.59 bits per pixel, which reduces it to 151 KB. Can you see the difference? I didn’t think so.
This is just a toy example, but I hope it shows how ASTC’s low-bit-rate 3D compression can change the game, making previously stressful or even unthinkable algorithms practical. I can’t wait to see how serious game developers will make use of the technology, when it reaches them.
As always – got comments or questions? Got ideas for clever ways to use HDR or 3D textures? Drop me a line…
Tom Olson is Director of Graphics Research at ARM. After a couple of years as a musician (which he doesn't talk about), and a couple more designing digital logic for satellites, he earned a PhD and became a computer vision researcher. Around 2001 he saw the coming tidal wave of demand for graphics on mobile devices, and switched his research area to graphics. He spends his working days thinking about what ARM GPUs will be used for in 2016 and beyond. In his spare time, he chairs the Khronos OpenGL ES Working Group.
It *is* incredibly exciting that such a technique is possible on mobile, and perhaps more-so that ASTC is enabling something that would have been considered unrealistic otherwise! But I think we can do better still:
Consider that the video method uses RGBA textures. Based on the type of image data held in the volume, this is overkill. In fact, similar to the teapot example in the blog post above, it should be possible to represent the explosion using a single channel and a small 1D gradient lookup. While this wouldn't improve the compression ratio, it may have a positive effect on the compression quality. It also provides an added benefit: for a reusable explosion varying the 1D gradient lookup would result in different looking explosions -- a green 'chemical' explosion would be possible with a green, and not an orange gradient. The downside is that a (very) tiny bit of complexity is added to the fragment shader.
While the above method may improve quality (but not compression), there's another way to significantly trim down the already manageable 3.7 MB stack. Going back to the ten 256x256x64 "slice" images in the video, you'll notice that there are many rows that hold no data, due to the rising nature of the explosion. In fact, if you split each image into rows, a full 51 out of 80 rows actually contain data. Instead of stacking the explosion in 10 slices, the 3D texture could contain 18 slices (only 17 would be used) with 3x8, 256x256 animation frames. This works out well: by eliminating the empty space, the size of the compressed 3D texture is now a hair over 2MB! Of course, there are consequences. This adds a bit of complexity to the fragment shader requiring a tiny lookup (slice lookup isn't quite as straight forward, though still very simple). You also lose spatial coherence along the z-axis, but gain a bit of temporal coherence. With a single-channel texture, this may not be a big deal. Lastly, this only really works with explosions that rise gradually or simulations that do not use the full volume over all animation frames.
Alternatively, you could have 64 stacked 'frames' of each frames full volume. It would make for a very tall 3D texture. But we could eliminate empty spaces by having the frame-volume only fill as many slices as needed and providing an offset for each of the 64 frame-volumes. For example, the first frame-volume would only be one slice as the explosion would still be quite small. This would have the benefit of needing a very simple shader and would allow the texture unit to interpolate the required texel lookup. ASTC could exploit spatial similarities along the z-axis as each volume would be stored. This method would yield comparable savings to the method directly above (around 2MB of storage), but greatly simplify the effect.
Of course, it may be possible to go crazy and attempt to eliminate the empty space around the explosion, by breaking the individual 256x256 slice frames into smaller squares, throw away the empty squares, stack the remaining squares into a set of image slices in a way that maximizes coherence in the volume and is friendly to compression. This would be the job of a generally useful automated tool, or would be overkill otherwise. While 3D textures wouldn't strictly be necessary, exploiting spatial coherence along the z-axis could be very beneficial to compression quality at high ratios. I haven't run a histogram on the image slices in the video, but there seems to be a tremendous amount of empty space around each of the slices which may add up to a significant memory savings. This would require a lossless lookup to locate the offset of the required slice, and logic to deal with edges, but the savings could be significant even with a large lookup. For example, splitting each 256x256 slice frame into 64 'tiles' would require a 160K (32bit index) lookup over the entire texture. If even a 50% savings can be achieved by stacking, it would still yield a significant memory savings.
Anyway, this technique is quite important in that it could be used to simulate many densely packed particle simulations. More uses that comes to mind are a section of a waterfall, pouring water, a water splash, a looping water wake, a small cloud volume segment, or even a non-animated clump of leaves as would exist in the inner part of a tree. As a single effect, it is definitely expensive, but if used as a largely re-usable effect, it seems positively a bargain given ASTC's outstanding compression ratios!
Thanks Tom! I had good fun.
> Unreal recently showed a very interesting way
Thanks for the link - very nice! If I'm reading the screen shots right, for that particular effect they are using ten 2K x 2K RGBA images, or 42 Mpix, compressed using DXT5 to 42 MB. Yes, a bit large for a quick effect. A straightforward conversion to ASTC 12x12 (2D) compression would knock it down to about 5 MB, still not small, and it's unclear whether the quality would be acceptable. ASTC makes better use of the bits you give it than DXT does, but at 0.89 bits per pixel (bpp) ASTC would still be a good bit noisier than DXT5 at 8bpp. But as you say, the interesting question is what could be done with 3D textures.
The 2Kx2K images in the demo are broken down into sixty-four 256x256 images, where each image is a fixed slice through the simulation volume at a particular time - conceptually, sixty-four frames of a video packed into a 2D array. There is a lot of correlation between successive frames, so there is redundancy there that a 2D compression format cannot capture. One option would be to compress each depth slice of the simulation as a 256x256x64 3D texture, using time as the Z axis of the volume. You'd use the depth slices just as they do in the video - the point sprite at a given depth would use the animation time step as the Z texture coordinate, and would sample the two nearest depth slice volumes and interpolate between them. Because ASTC 3D would be able to exploit frame-to-frame redundancy in the animation, you might get away with a significantly lower bit rate.
The other option would be to compress the ten image slices directly as a 2k x 2k x 10 volume texture. Just as there is redundancy between successive frames of the animation, there is redundancy between neighboring depth slices, which ASTC would exploit. Would this approach be better than the first one? Depends whether there is more correlation between frames of video, or between depth slices. It would have the advantage that ASTC would do the depth interpolation for you, so it might save a shader cycle or two. You'd want to be a little careful, and make sure that the ASTC block size divides evenly into the number of depth slices; for example, the 5x5x5 mode (1.02 bpp) would work, or you could go to twelve depth slices compressed at 6x6x6 (0.59 bpp).
In the very best case, if you stuck with the image sizes used in the video, but were able to compress using ASTC 6x6x6 mode, you'd compress the original 42 MB DXT5 image stack to 3 MB. A big improvement, but make no mistake, this is an expensive effect! On the other hand, it's pretty exciting that we can talk about this kind of thing at all as a real possibility in a mobile device.
One final neat use for 3D textures would be storing distance fields and using it in conjunction with ray-marching.
While raymarching can be expensive, with sufficiently dense space, it could be effectively used to draw very objects of very high count. I have used a fully ray-marched shader on my T604 powered Nexus 10 quite successfully 1/4 the screen at interactive rates. In this case, the shader is using many calculated distance field primitives (eg. spheres, cylinders with rounded edges), rather than a lookup. A lookup would require far less computation, and provide artistic freedom regarding the render-able structures.
Think of a grassy field. Rather than drawing a multitude of billboards, patches of grass blades can exist as a 3D texture distance field. This volume can be repeated over a field and use a random lookup for blade uniquness by tweaking patch-rotation, patch-dimension, colour, etc. The 3D-texture distance field can then be easily queried based on the view-ray with a maximum set number of steps.
This goes beyond displacement mapping as the to-be-rendered objects represented by the 3D texture wouldn't be simple elevations from a plane, but complex 'non-functional' shapes.
When viewed from a predictable perspective this could provide an interesting way to render certain types of geometry. In addition to grass, shag carpet fibres, coarse gravel, debris, or even rope braids, can be represented by a 3D texture distance field.
Ok.. I'm done, though I could probably think of more uses.
Another really interesting use of 3D textures would be a pre-computed vector field to drive particle trajectory. Though the effect would be non-dynamic, it could be very sophisticated and would be extremely convincing.
Imagine a scenario where an in-game character or vehicle cuts walks through a space with a thin layer of fog on the floor. The turbulence pattern as the fog spreads and curls behind the character is predictable, but quite expensive to computationally model in real-time. A set of vector fields could control a set of particles that displace appropriately. Now the particles need only consult the volume to determine where they should be moving. In fact using transform feedback, could be completely done via GPU vertex shaders!
Another effective use of this would be liquid simulation. This is tricky to visualize, but bare with me! If you separate a body of water (eg. pool) into layers consisting of the lower water layer that determines elevation, and a top layer that represents dis-placeable particles, we can use pre-computed 3D texture vector fields to drive coarse 'water' particles during separation events -- events where the water breaks from the lower level (eg. splashes). These vector fields can be stored for a standard surfing 'tube' waves, splashes, or patterns of water colliding with static objects. The software then uses the 3D textures to determine the best simulation for the particles. Used in a controlled way where the dynamics are predictable, this should provide startlingly real simulations.
Even storing vector fields around in-scene objects could suggest how particles should interact with it, and forgo the need for extremely expensive computation. A vector field that moves around a cylindrical object could contain the appropriate interference and residual 'wake,' and would make for a very interesting in-game simulation of smoke. The same can be done for other particles such as leaves.
Unreal recently showed a very interesting way to create very realistic explosions using a sequence of "sets" of 2D planes (3D textures): https://www.youtube.com/watch?v=Q_-LrvzhBhM
The idea seems relatively straight forward: A compute intensive explosion simulation is captured via a volume-set of 2D texture planes (3D texture) for a number of frames. In the realtime scene, a set of randomly distributed static quad particles are placed in the volume where the explosion will take place. The particles sample the intersection points of the volume at the appropriate stage of the animation thus recreating the subtle parallax of the explosion volume animation in a convincingly 3D way.
Due to the transient and noisy nature of each frame of the captured particle simulation, this seems like a glove-fit for ASTC at high compression ratios (where compression artifacts are far less likely to be noticed). The captured texture 'planes' for each explosion stage could be efficiently stored as a 3D texture. Additionally, the tile-based renderer would work wonderfully with front-to-back sorted quad-particles. This would reduce rendering time and bandwidth by avoiding unnecessary work.
All in all, this sophisticated volumetric effect should be possible on mobile, and re-usable as explosions in a game scene for a very impressive effect. In fact, it would provide a very interesting method of producing other realistic non-compressible fluids in a scene with predictable (and stable) dynamics. It's not a dynamic effect, but it doesn't necessarily need to be to be convincing.
Though the volume will likely be quite large in memory even with compression, without high compression ratios, it's not a terribly realistic option for mobile. This is why ASTC would enable these types of effects.