ARM's GPU architects and engineers regularly push the envelope in mobile graphics technology, which is why the latest Mali GPU cores offer such a unique combination of best-in-class graphics performance, an aggressively forward-looking feature set, and unprecedented scalability. But the engineers also make more fundamental contributions to graphics technology. This week at SIGGRAPH Asia, we're disclosing a new approach to texture compression. This technology enables deep reductions in GPU memory bandwidth and application memory footprint, which in turn allows improved performance and lower power. In this blog, I'll talk about where the technology came from, why it's important, and where we're going with it.
Why I Love My Job
Early in my engineering career, my boss/mentor at the time told me something I've never forgotten: According to polls (he said), among engineers who say they love their jobs, the thing they like best about them is that "they get to work with smart people". I don't know where he got this, but I'm sure it's true, and it explains why I love my job so much. In my dual role as ARM's Director of Graphics Research, and chair of the Khronos OpenGL ES Working Group, I get to work with some of the smartest people on the planet.
I've actually been enjoying this aspect of my job for years, but the last nine months or so have been in a class by themselves. The fun started back in March, with an email from one of our senior graphics architects, Jørn Nystad. He had come up with some ideas for texture compression, which he thought were significantly different from any previous texture compression format, and also interesting enough to start putting serious work into. He was calling the compression format "ASTC", for Adaptive Scalable Texture Compression.
When Jørn says something is interesting, he is invariably right, but in this case it was a massive understatement. Since that email, I've been watching in amazement as he pulled one rabbit after another out of a hat, raising ASTC image quality higher and higher, making the software codec ever faster, and reducing the hardware cost. The upshot is that as I write this, I'm packing my bags for Hong Kong, where I'll be giving a technical sketch presentation on ASTC at SIGGRAPH Asia; because ASTC isn't just interesting - it is revolutionary.
Why Texture Compression Matters
In order to see why ASTC is so significant, we have to look at what texture compression is and why it matters. And in order to do that, we have to talk about GPUs and memory. First, let's forget for a moment that GPUs are devices for making pretty pictures, and think of them instead the way systems engineers do: as devices for generating ridiculous numbers of memory accesses. Computer memory systems today are characterized by high bandwidth (they let you read or write a lot of bytes per second), but also high latency (the time between when you ask to read some data, and when you actually get it, is relatively long). Conventional computers (CPUs) deal with this by keeping as much data as possible in fast memory, close to the processor; but when they need to ask for something that is in high-latency main memory, they issue the request and then stop working (stall) until the data arrives. GPUs try not to do this; instead, if they can't get a piece of data they want, they issue a request for it and then switch to working on something else - and since there are a whole lot of pixels on the screen, there is almost always something else to switch to. But that something else usually involves reading and writing memory too! So they issue another request and switch to yet another "something else", and so on. The result is that given a complicated scene to render, typical GPUs will emit a blizzard of memory requests, happily eating up all the memory bandwidth and outstanding-request capacity the system is willing to let them have.
There is of course a lot of magic you can do under the hood to reduce how much memory bandwidth a GPU needs to render a given scene, and we pride ourselves on the fact that the ARM Mali GPUs are very, very bandwidth-efficient. But there are limits to what you can do under the hood. If the application says that a given pixel needs to read a given texture sample, you just have to read that sample.
Which brings us back to texture compression. Ever since texture mapping took off back in the 90's, texture access has been recognized as one of the most important consumers of memory bandwidth in graphics systems - to the point where the amount of bandwidth available for texture fetches ends up limiting the performance of the GPU, and often the best way to make a graphics application run faster is to reduce the size of the textures. On mobile devices, bandwidth is even more important, because reading main memory costs a lot of power. So, if you can compress your textures, you reduce memory bandwidth requirements, and if you do that you improve performance and save power. Sounds like a good idea, right?
Since texture compression is such a good idea, it isn't surprising that people have been working on it for a long time. But doing it well isn't easy. Because texture samplers have to support random access in real time, compressed texture formats have a lot of constraints that don't apply to image compression formats like JPEG. And, because satisfying those constraints is hard, the set of compressed texture formats available to developers today forms a chaotic patchwork - a patchwork with a lot of holes in it. You have to trade compression ratio (bits per pixel, or bpp) against quality - more compressed images (lower bpp) have poorer quality. And, you have to choose formats that have the right number of color channels for your application - if you can find any at the bit rate you want.
The Texture Compression Landscape
Here's a quick Cook's tour of what's available today:
- On many mobile platforms, you can use the Khronos-endorsed ETC1 format to compress color (RGB) images at 4 bits per pixel (bpp). ETC1 is royalty-free when used with OpenGL ES, but is not a required part of the standard and is not available on all platforms.
- On desktop and a few mobile platforms, you can use S3TC (aka DXTn) to compress color (RGB) or color-plus-mask images at 4bpp, or color-plus-transparency (RGBA) at 8bpp. These formats are proprietary, so they aren't available on all platforms either.
- On some mobile platforms, you can use PVRTC to compress RGB or RGBA images at 2bpp or 4bpp. PVRTC is also proprietary.
- On desktop platforms, if you have one- or two-channel data, you can use RGTC at 4bpp for one channel or 8bpp for two.
- If you want really high quality, desktop platforms can use BPTC/BC7 for RGB and RGBA at 8bpp.
- All of the above formats are for images with 8-bit color components. If you want to compress floating-point (High Dynamic Range or HDR) images, you need BPTC /BC6H, also available only on desktop platforms, and only at 8bpp.
As you can see, it's a mess.
ASTC to the rescue: increased flexibility and better quality
Now (finally!) we're ready to talk about ASTC. Under the hood, there is some very clever engineering that's a little too deep for a blog post, so we'll just talk about what it does...it gives you flexibility. Where other formats provide one (or a small number) of bit rates and one or two color formats (e.g., RGB and RGBA), ASTC gives you your choice of six bit rates from 8bpp all the way down to less than 1bpp. At any bit rate, you can have from one to four color components; so you get RGB and RGBA formats like DXT or PVRTC, but also one- and two-component formats like RGTC. And if that wasn't enough, you get HDR (floating point) as well as 8-bit color components; and if that isn't enough, you also get 3D images (volumetric textures).
If you're an engineer (at least a deeply suspicious engineer like me) you'll be expecting that this flexibility has a price, possibly in silicon area, but almost certainly in quality. And indeed, ASTC isn't small, but it isn't much bigger than high-end formats like BPTC. But what's really amazing is its quality. At four bits per pixel, ASTC's Peak Signal to Noise Ratio (PSNR) beats DXT1 by a decibel and a half (1.5 dB). At 2 bpp, ASTC beats PVRTC by 2.3 dB. Most human observers can easily detect a quality difference of about a quarter of a decibel, so these are huge margins. So, in addition to offering unheard-of flexibility, ASTC offers a huge step up in image quality compared to the leading existing formats.
As an example of what ASTC can do, below (Figure 1) is an image I took on vacation a few years back. Figure 2 shows three versions of a detail from that image. At top (2a) is the original. Below that (2b) is the detail image compressed with PVRTC at 2bpp, and below that (2c) is ASTC at the same bit rate. I think the quality difference is obvious.
It won't surprise you to hear that we're patenting the various clever tricks that make ASTC work. But fundamental advances like this are more valuable if they're shared, so we don't plan to keep it to ourselves; rather, we're going to share it, by offering it for inclusion into industry graphics standards. We've had some great feedback and suggestions from software developers and from other GPU vendors we've talked to, and we're taking that all into account. We've made significant progress since the SIGGRAPH Asia paper was written, and hope to make ASTC even better. Watch this space!
Got questions? Got ideas for what you'll do with, say, 3D floating point textures, when ASTC makes them small enough to fit in memory?