Time sure flies when you’re having fun! It’s been more than two years since SIGGRAPH Asia 2011 in Hong Kong, where I had the pleasure of unveiling our Adaptive Scalable Texture Compression (ASTC) technology. A lot has happened on the ASTC front since then: we made many technical improvements, providing even better quality and finer control of bit rate; we published full technical details of the format at High Performance Graphics 2012; the Khronos Group ratified an ASTC extension; and the first consumer devices with ASTC support began to appear on the market. It fairly boggles the mind. Well, it does mine, anyway.
Why am I wandering down memory lane like this? Two reasons: One is that just before this past Christmas, I went back to Hong Kong for SIGGRAPH Asia 2013, this time to talk more generally about power and memory bandwidth reduction in the latest ARM® Mali™ GPUs. The other reason, and the main one, is that ASTC recently passed another milestone in its march toward ubiquity, as the Khronos Group ratified extensions covering the full functionality of the format.
You thought Khronos had already standardized ASTC? It had, but not completely. Khronos released an extension called KHR_texture_compression_astc_ldr at SIGGRAPH 2012. However, that extension exposed only the low dynamic range (LDR, get it?) pixel formats of ASTC, and only for 2D images. We did that because at the time, details of the high dynamic range (HDR) and 3D features hadn’t been nailed down, and some Khronos members weren’t sure they would work as well as we hoped – they were, after all, pretty revolutionary. Also, some Khronos members wanted to start implementing the 2D LDR features of ASTC right away, before we were ready to freeze the definition of the more advanced features.
I’m happy to say that, in the end, the HDR and 3D features of ASTC turned out to work very well indeed. Recognizing this, the Khronos Group recently ratified two new extensions, adding HDR and 3D functionality respectively. The ASTC family now looks like this:
KHR_texture_compression_astc_ldr is the previously-ratified low dynamic range profile
KHR_texture_compression_astc_hdr extends the LDR profile to include HDR
OES_texture_compression_astc extends the HDR profile to include 3D textures
The extensions are layered, with each new layer requiring the previous layers, so if your implementation supports KHR_texture_compression_astc_hdr, all of the LDR features are supported too. If it supports OES_texture_compression_astc, it supports everything. If you try to use an HDR texture on an implementation that doesn’t support HDR, the LDR portions of the texture decode normally, and the HDR texels come back a lovely shade of radioactive pink.
You might be wondering about the extension name prefixes: why KHR-blah-ldr and KHR-blah-hdr, but OES-blah-astc? The OES prefix identifies an extension that is defined and ratified by the OpenGL® ES working group, for use with OpenGL ES. Extensions with the KHR prefix are ratified by both the OpenGL ES and desktop OpenGL® working groups, and can be used with either API. So, you can and will see ASTC LDR- and HDR-capable GPUs on desktop as well as mobile devices, but for the moment there’s no way to ship ASTC 3D textures on the desktop. It’s too bad, but hey, OpenGL ES is shipping in a billion devices a year; and the desktop will catch up eventually.
So, ASTC HDR and 3D are now available as Khronos standards. What does that mean? How does it make life better for mobile device manufacturers, or app developers, or users?
We’ve written at length about the technology – how ASTC offers developers unprecedented flexibility in bit rate and pixel format, as well as a substantial boost in image quality. And Sean has a great article describing how the HDR and 3D features of ASTC work, and why they’re useful – even, potentially, revolutionary. If you aren’t convinced by now, you aren’t going to be, so I won’t repeat that story here.
What Khronos standardization adds to the picture is that it puts ASTC on the road to becoming universally available. By placing the format under the Khronos IP umbrella, it removes the uncertainties that have prevented widespread adoption of proprietary formats like S3TC and PVRTC. It is also, obviously, a powerful endorsement of the technology. Add in the enthusiastic reception the format has received from developers, and the bottom line is that GPU vendors now have many reasons to support it in their hardware, and few reasons not to. ASTC has been available for some time now in the Exynos-based versions of the Samsung Galaxy Note 3, Note Pro and other devices, which feature the ARM Mali-T628 MP6 GPU. We understand that it’ll be supported in upcoming SoCs and IP cores from Qualcomm, NVIDIA, and Imagination Technologies as well. Other implementations are on the way.
I said I wasn’t going to talk about ASTC from a technical point of view, but I can’t resist – after all, you can’t write a blog about texture compression without showing an image, can you? So here’s an image. Actually, here are two:
Figure 1: A (chocolate-free) teapot rendered using a 2MB volume texture
Figure 2: The same teapot with the volume texture compressed to 151KB using ASTC.
What you’re seeing is an implementation of a procedural marble shader, taken from the AMD RenderMonkey™ examples. What’s interesting about it is that it’s not a 2D marble texture uv-mapped onto the surface of the teapot. Instead, the shader samples a 3D noise function at every point on the surface, and uses the result to sample a 1D color gradient texture. The 1D texture is tiny, but the noise function is implemented as a 128x128x128 volume texture. The original 8-bit, single channel texture (used to produce the upper image) occupies 2 MB – not huge, but big enough to make you ask if you really need it, at least on a mobile device. The version in the second image uses the same volume texture, compressed using ASTC at 0.59 bits per pixel, which reduces it to 151 KB. Can you see the difference? I didn’t think so.
This is just a toy example, but I hope it shows how ASTC’s low-bit-rate 3D compression can change the game, making previously stressful or even unthinkable algorithms practical. I can’t wait to see how serious game developers will make use of the technology, when it reaches them.
As always – got comments or questions? Got ideas for clever ways to use HDR or 3D textures? Drop me a line…
Tom Olson is Director of Graphics Research at ARM. After a couple of years as a musician (which he doesn't talk about), and a couple more designing digital logic for satellites, he earned a PhD and became a computer vision researcher. Around 2001 he saw the coming tidal wave of demand for graphics on mobile devices, and switched his research area to graphics. He spends his working days thinking about what ARM GPUs will be used for in 2016 and beyond. In his spare time, he chairs the Khronos OpenGL ES Working Group.
It *is* incredibly exciting that such a technique is possible on mobile, and perhaps more-so that ASTC is enabling something that would have been considered unrealistic otherwise! But I think we can do better still:
Consider that the video method uses RGBA textures. Based on the type of image data held in the volume, this is overkill. In fact, similar to the teapot example in the blog post above, it should be possible to represent the explosion using a single channel and a small 1D gradient lookup. While this wouldn't improve the compression ratio, it may have a positive effect on the compression quality. It also provides an added benefit: for a reusable explosion varying the 1D gradient lookup would result in different looking explosions -- a green 'chemical' explosion would be possible with a green, and not an orange gradient. The downside is that a (very) tiny bit of complexity is added to the fragment shader.
While the above method may improve quality (but not compression), there's another way to significantly trim down the already manageable 3.7 MB stack. Going back to the ten 256x256x64 "slice" images in the video, you'll notice that there are many rows that hold no data, due to the rising nature of the explosion. In fact, if you split each image into rows, a full 51 out of 80 rows actually contain data. Instead of stacking the explosion in 10 slices, the 3D texture could contain 18 slices (only 17 would be used) with 3x8, 256x256 animation frames. This works out well: by eliminating the empty space, the size of the compressed 3D texture is now a hair over 2MB! Of course, there are consequences. This adds a bit of complexity to the fragment shader requiring a tiny lookup (slice lookup isn't quite as straight forward, though still very simple). You also lose spatial coherence along the z-axis, but gain a bit of temporal coherence. With a single-channel texture, this may not be a big deal. Lastly, this only really works with explosions that rise gradually or simulations that do not use the full volume over all animation frames.
Alternatively, you could have 64 stacked 'frames' of each frames full volume. It would make for a very tall 3D texture. But we could eliminate empty spaces by having the frame-volume only fill as many slices as needed and providing an offset for each of the 64 frame-volumes. For example, the first frame-volume would only be one slice as the explosion would still be quite small. This would have the benefit of needing a very simple shader and would allow the texture unit to interpolate the required texel lookup. ASTC could exploit spatial similarities along the z-axis as each volume would be stored. This method would yield comparable savings to the method directly above (around 2MB of storage), but greatly simplify the effect.
Of course, it may be possible to go crazy and attempt to eliminate the empty space around the explosion, by breaking the individual 256x256 slice frames into smaller squares, throw away the empty squares, stack the remaining squares into a set of image slices in a way that maximizes coherence in the volume and is friendly to compression. This would be the job of a generally useful automated tool, or would be overkill otherwise. While 3D textures wouldn't strictly be necessary, exploiting spatial coherence along the z-axis could be very beneficial to compression quality at high ratios. I haven't run a histogram on the image slices in the video, but there seems to be a tremendous amount of empty space around each of the slices which may add up to a significant memory savings. This would require a lossless lookup to locate the offset of the required slice, and logic to deal with edges, but the savings could be significant even with a large lookup. For example, splitting each 256x256 slice frame into 64 'tiles' would require a 160K (32bit index) lookup over the entire texture. If even a 50% savings can be achieved by stacking, it would still yield a significant memory savings.
Anyway, this technique is quite important in that it could be used to simulate many densely packed particle simulations. More uses that comes to mind are a section of a waterfall, pouring water, a water splash, a looping water wake, a small cloud volume segment, or even a non-animated clump of leaves as would exist in the inner part of a tree. As a single effect, it is definitely expensive, but if used as a largely re-usable effect, it seems positively a bargain given ASTC's outstanding compression ratios!
Thanks Tom! I had good fun.