Time sure flies when you’re having fun! It’s been more than two years since SIGGRAPH Asia 2011 in Hong Kong, where I had the pleasure of unveiling our Adaptive Scalable Texture Compression (ASTC) technology. A lot has happened on the ASTC front since then: we made many technical improvements, providing even better quality and finer control of bit rate; we published full technical details of the format at High Performance Graphics 2012; the Khronos Group ratified an ASTC extension; and the first consumer devices with ASTC support began to appear on the market. It fairly boggles the mind. Well, it does mine, anyway.
Why am I wandering down memory lane like this? Two reasons: One is that just before this past Christmas, I went back to Hong Kong for SIGGRAPH Asia 2013, this time to talk more generally about power and memory bandwidth reduction in the latest ARM® Mali™ GPUs. The other reason, and the main one, is that ASTC recently passed another milestone in its march toward ubiquity, as the Khronos Group ratified extensions covering the full functionality of the format.
You thought Khronos had already standardized ASTC? It had, but not completely. Khronos released an extension called KHR_texture_compression_astc_ldr at SIGGRAPH 2012. However, that extension exposed only the low dynamic range (LDR, get it?) pixel formats of ASTC, and only for 2D images. We did that because at the time, details of the high dynamic range (HDR) and 3D features hadn’t been nailed down, and some Khronos members weren’t sure they would work as well as we hoped – they were, after all, pretty revolutionary. Also, some Khronos members wanted to start implementing the 2D LDR features of ASTC right away, before we were ready to freeze the definition of the more advanced features.
I’m happy to say that, in the end, the HDR and 3D features of ASTC turned out to work very well indeed. Recognizing this, the Khronos Group recently ratified two new extensions, adding HDR and 3D functionality respectively. The ASTC family now looks like this:
KHR_texture_compression_astc_ldr is the previously-ratified low dynamic range profile
KHR_texture_compression_astc_hdr extends the LDR profile to include HDR
OES_texture_compression_astc extends the HDR profile to include 3D textures
The extensions are layered, with each new layer requiring the previous layers, so if your implementation supports KHR_texture_compression_astc_hdr, all of the LDR features are supported too. If it supports OES_texture_compression_astc, it supports everything. If you try to use an HDR texture on an implementation that doesn’t support HDR, the LDR portions of the texture decode normally, and the HDR texels come back a lovely shade of radioactive pink.
You might be wondering about the extension name prefixes: why KHR-blah-ldr and KHR-blah-hdr, but OES-blah-astc? The OES prefix identifies an extension that is defined and ratified by the OpenGL® ES working group, for use with OpenGL ES. Extensions with the KHR prefix are ratified by both the OpenGL ES and desktop OpenGL® working groups, and can be used with either API. So, you can and will see ASTC LDR- and HDR-capable GPUs on desktop as well as mobile devices, but for the moment there’s no way to ship ASTC 3D textures on the desktop. It’s too bad, but hey, OpenGL ES is shipping in a billion devices a year; and the desktop will catch up eventually.
So, ASTC HDR and 3D are now available as Khronos standards. What does that mean? How does it make life better for mobile device manufacturers, or app developers, or users?
We’ve written at length about the technology – how ASTC offers developers unprecedented flexibility in bit rate and pixel format, as well as a substantial boost in image quality. And Sean has a great article describing how the HDR and 3D features of ASTC work, and why they’re useful – even, potentially, revolutionary. If you aren’t convinced by now, you aren’t going to be, so I won’t repeat that story here.
What Khronos standardization adds to the picture is that it puts ASTC on the road to becoming universally available. By placing the format under the Khronos IP umbrella, it removes the uncertainties that have prevented widespread adoption of proprietary formats like S3TC and PVRTC. It is also, obviously, a powerful endorsement of the technology. Add in the enthusiastic reception the format has received from developers, and the bottom line is that GPU vendors now have many reasons to support it in their hardware, and few reasons not to. ASTC has been available for some time now in the Exynos-based versions of the Samsung Galaxy Note 3, Note Pro and other devices, which feature the ARM Mali-T628 MP6 GPU. We understand that it’ll be supported in upcoming SoCs and IP cores from Qualcomm, NVIDIA, and Imagination Technologies as well. Other implementations are on the way.
I said I wasn’t going to talk about ASTC from a technical point of view, but I can’t resist – after all, you can’t write a blog about texture compression without showing an image, can you? So here’s an image. Actually, here are two:
Figure 1: A (chocolate-free) teapot rendered using a 2MB volume texture
Figure 2: The same teapot with the volume texture compressed to 151KB using ASTC.
What you’re seeing is an implementation of a procedural marble shader, taken from the AMD RenderMonkey™ examples. What’s interesting about it is that it’s not a 2D marble texture uv-mapped onto the surface of the teapot. Instead, the shader samples a 3D noise function at every point on the surface, and uses the result to sample a 1D color gradient texture. The 1D texture is tiny, but the noise function is implemented as a 128x128x128 volume texture. The original 8-bit, single channel texture (used to produce the upper image) occupies 2 MB – not huge, but big enough to make you ask if you really need it, at least on a mobile device. The version in the second image uses the same volume texture, compressed using ASTC at 0.59 bits per pixel, which reduces it to 151 KB. Can you see the difference? I didn’t think so.
This is just a toy example, but I hope it shows how ASTC’s low-bit-rate 3D compression can change the game, making previously stressful or even unthinkable algorithms practical. I can’t wait to see how serious game developers will make use of the technology, when it reaches them.
As always – got comments or questions? Got ideas for clever ways to use HDR or 3D textures? Drop me a line…
Tom Olson is Director of Graphics Research at ARM. After a couple of years as a musician (which he doesn't talk about), and a couple more designing digital logic for satellites, he earned a PhD and became a computer vision researcher. Around 2001 he saw the coming tidal wave of demand for graphics on mobile devices, and switched his research area to graphics. He spends his working days thinking about what ARM GPUs will be used for in 2016 and beyond. In his spare time, he chairs the Khronos OpenGL ES Working Group.
> Unreal recently showed a very interesting way
Thanks for the link - very nice! If I'm reading the screen shots right, for that particular effect they are using ten 2K x 2K RGBA images, or 42 Mpix, compressed using DXT5 to 42 MB. Yes, a bit large for a quick effect. A straightforward conversion to ASTC 12x12 (2D) compression would knock it down to about 5 MB, still not small, and it's unclear whether the quality would be acceptable. ASTC makes better use of the bits you give it than DXT does, but at 0.89 bits per pixel (bpp) ASTC would still be a good bit noisier than DXT5 at 8bpp. But as you say, the interesting question is what could be done with 3D textures.
The 2Kx2K images in the demo are broken down into sixty-four 256x256 images, where each image is a fixed slice through the simulation volume at a particular time - conceptually, sixty-four frames of a video packed into a 2D array. There is a lot of correlation between successive frames, so there is redundancy there that a 2D compression format cannot capture. One option would be to compress each depth slice of the simulation as a 256x256x64 3D texture, using time as the Z axis of the volume. You'd use the depth slices just as they do in the video - the point sprite at a given depth would use the animation time step as the Z texture coordinate, and would sample the two nearest depth slice volumes and interpolate between them. Because ASTC 3D would be able to exploit frame-to-frame redundancy in the animation, you might get away with a significantly lower bit rate.
The other option would be to compress the ten image slices directly as a 2k x 2k x 10 volume texture. Just as there is redundancy between successive frames of the animation, there is redundancy between neighboring depth slices, which ASTC would exploit. Would this approach be better than the first one? Depends whether there is more correlation between frames of video, or between depth slices. It would have the advantage that ASTC would do the depth interpolation for you, so it might save a shader cycle or two. You'd want to be a little careful, and make sure that the ASTC block size divides evenly into the number of depth slices; for example, the 5x5x5 mode (1.02 bpp) would work, or you could go to twelve depth slices compressed at 6x6x6 (0.59 bpp).
In the very best case, if you stuck with the image sizes used in the video, but were able to compress using ASTC 6x6x6 mode, you'd compress the original 42 MB DXT5 image stack to 3 MB. A big improvement, but make no mistake, this is an expensive effect! On the other hand, it's pretty exciting that we can talk about this kind of thing at all as a real possibility in a mobile device.
regards,
Tom