ASTC Does It - Part II: How To Use It

December 17, 2014

11 minute read time.

This year at GDC I gave a presentation on our exhibition booth about using ASTC with different types of textures to get the best visual results. It’s interesting that in the past whenever I spoke about ASTC it was always about how it works, rather than how to use it, which is bizarre because that’s not really what developer education is about.

It would be like a driving instructor turning up and lecturing you for the full hour on the science behind the internal combustion engine.

I did go on to write a fairly long guide to understanding the various settings and options you get when compressing in ASTC for GPU Pro, and the release of that roughly coincided with my booth talk at GDC. Those present on the day may have noticed the presentation wasn’t up to my usual standard. I can only apologise, I was very ill and dragged myself out of the hotel to give the talk before immediately returning at the end.

I’d like to use this as an opportunity to reiterate some of that content in the form of a blog, to clarify some parts I missed or stumbled over on the day. For those who weren’t there or just want to relive the presentation, I’ve attached a recording of it here.

The first topic I covered is a really basic introduction to texture compression in general, including a few notes on why textures should be compressed in the first place. With the ubiquity of gif, jpg and png image compression formats, surprisingly few people stop to think about the size of raw pixel data. Whether you have an alpha channel or not, cache alignment means you’re essentially always packing one in raw image data, making a 32 bits per pixel (bpp) cost. With even modest textures weighing in around a million pixels each, you can see how this might get quite heavy.

It’s not the size of the texture that causes the real problem, it’s the fact that you have to constantly look that data up, as the GPU taps into the main memory to pull that data into its cache whilst shading fragments, all of which compounds the bandwidth usage of the application. The solution to this is not compressed image files unpacked into GPU memory, it’s compressed textures in the GPU memory that the GPU can unpack as needed. This places interesting constraints on the compression formats used.

Firstly the pixels need to be accessed randomly. PNG is all well and good for compressing a whole image but to unpack a single pixel you have to unpack the entire line it’s on. Maybe it’d be worthwhile if you were reading in order along that axis, but if you’re sampling across the lines, you end up unpacking far more data than you need. Compression relies on grouping data to compressed bundles, so optimally these bundles need to be blocks of pixels, not lines, allowing the block to be decompressed into the cache and sampled randomly in any direction.

As this implies, the blocks have to be completely standalone. Other than the compression format, there must be no external information such as a dictionary or symbol table to decode the block. Finally, blocks have to line up in memory in a regular formation, or your decompressor won’t know where to look in the data to find a specific block. Scanning through to find it is not an option.

This is why texture compression has its own specialist formats for the task. In the older ARM® Mali™ GPUs, we only supported the ETC and later ETC2 formats because those are Khronos standards. There’s a pretty good reason for sticking to standards because the capability and availability of different compression formats is rather sparsely populated. Your choice of format might not just lock your application into a certain bitrate or channel specification; a proprietary format could also lock it to specific hardware.

ASTC is a texture compression format designed to solve this problem from the ground up, allowing different bit rates, different pixel formats, even different combinations of spatial dimensions to be picked for any given texture. So maybe you want a 2D high bit rate normal map with just X and Y data, or maybe you want a low bit rate 3D HDR RGBA texture? ASTC can do both, and more.

If you want to know how that even works, I already wrote about that at length here. If you want to know how to get the best results from it, you’re in the right place.

The quality of a compressed texture is controlled with three main factors: the bit rate, the limits and the error factors. I’ll tackle these from the easiest to understand to the hardest.

Bit Rates and Block Size

ASTC, as you may know, can encode in different block modes. The dimensions of a single block are called its footprint. Whereas other texture formats have fixed footprints, ASTC has various block footprints from 4x4 to 12x12 (and from 3x3 to 6x6x6). What stays the same in ASTC is the data size used to encode it, at exactly 128 bits. So if those 128 bits encode a 4x4 block (16 pixels), that’s 8bpp, whereas the 12x12 block (144 pixels) is a staggering 0.98bpp. If you think that’s impressive, a 6x6x6 block is 216 pixels, making it 0.59 bits per pixel. Ordinarily at this point there would just be a reminder that higher bit rate leads to higher quality and move on, but you’ve spent the time to read this far so I’ll explode that myth for you. A 128 bit block can represent 2128 different binary combinations, each of which will map to a specific layout of pixels. The smallest block size, 4x4, contains 16 pixels, which at 32bpp (RGBA) can represent 2512 different combinations of pixel data. For those not used to thinking in binary, that means you have less than one in a googol’s chance of getting an exact match (a googol is one with a hundred zeros). That may seem very small, but the whole point is that you don’t need an exact match for every outcome, and the best texture compression formats are geared towards the 2128 pixel layouts most likely to make sense as part of a larger image.

The point is, if you’re using 12x12 blocks, there are 24608 combinations. The probability of getting an exact match on a block that size is less than one in one with a thousand zeroes, which we don’t even have a proper name for; it also means there’s a much lower chance of even getting a passable match for it. The compressor will have to pick the best configuration it can, and hope you don’t notice.

Limits and Leeway

Which leads us neatly onto limits, or how hard should the compressor try to find a good match? The whole point of texture compression algorithms is that they have a fast deterministic decompression function, but after a few intelligent choices, the best the compressor can do is try out different combinations and see how close they are. This means the more it checks, the more likely it is to find a good one. You don’t necessarily want to check them all; that would take a very long time. This is why you have to set limits. The limits can be things like “only try so many combinations, then give up and pick the best we found” or “if you find one that’s suitably close to the original, use that and stop looking” or even “if you try a few patterns with two partitions and it’s no better than those using a single partition, don’t bother trying three or four partitions” (the concept of partitions is explained in this blog post)

It’s fair to say most people wouldn’t know where to begin setting signal to noise decibel ratings for these kinds of decisions so, handily, the compressor has a few in-built presets from very fast to exhaustive. There’s a chance that it will find the best combination in the very fast presets, but it’s a very low chance. The probability is much higher if you’re willing to wait. The best advice therefore is to iterate your assets with fast or very fast compression, then ship with thorough or exhaustive compression. Curiously there’s very little difference between the result from thorough and exhaustive but exhaustive will take a lot longer, this again is down to the relative probabilities involved.

The one question remaining, therefore, is if it’s trying all these different blocks of pixels to see how close they are to the same block in the raw image, how is it comparing them?

Priority and Perception

In order to tell which one out of a hundred or a thousand or even a hundred thousand proposed blocks is the best, you need to be able to compare any pair and say that one is objectively better than the other, then repeat with the best and the next attempt. The standard way to compare two images is called PSNR or percentage signal to noise ratio, so you take your original image, subtract all of the colour values from your resulting image, convert all the negative numbers to positive (the absolute difference) and then sum them. The ratio part comes from a sort of imaginary maximum error, which would be if an all white image came out all black or vice versa.

But there are different things you might want to preserve.

When the numbers are added together they can have weightings applied to them. Little known fact, the human optic system is more sensitive to high frequency detail in green light than red or blue. Using this knowledge you can add a pre-multiplier to different channels. If you gave a weight of two to the green error, and there were two tiles which differed by roughly the same amount, one mostly in the red channel, one mostly in the green channel, the error in the green channel would be doubled, meaning the one with the red error would be considered a better match. Alternatively, you could be more concerned about angular error. This is particularly relevant in normal maps where the pixels represent not a colour to be displayed on screen but a field of vectors. In this scenario the ratio of the channels is far more important than simple per channel or overall magnitude differences, and this can be reflected by giving a weight to the angular component.

One interesting thing that arises as the result of block based comparisons is that errors near the edge of a tile may have positive error within the defined limits, and the errors on the adjacent tile may be negative error within the defined limits, making the step change between two blocks, which should match up of course, larger than the desired error bounds. Block weighting reduces that error by applying additional error weight to boundary mismatches. If you really want to get under the hood, there are a few settings that tinker with the way individual pixel errors are combined into a full block error. These work by applying weights and pre-multipliers to the mean average error and the standard deviation of the error in a certain radius. I could talk at length about how this may be weighted to favour a tile with a few big errors over a tile with lots of little errors, or certain settings can favour a noisy looking tile over one which smoothes minor details out, and I haven’t even researched all the possibilities yet. Either way it’s a huge topic and one that although I touch upon in the presentation, I’m going to leave alone for now and go into much greater detail at a later date.

Getting Started with ASTC

If you want to try out ASTC you’ve got quite a few options. There are commercial devices available right now with the appropriate hardware to decode ASTC on the GPU even though it’s still a very new technology.

If you’d like to see how it looks without the hardware advantages of memory bandwidth reduction, the OpenGL ES 3.0 emulator can handle ASTC textures (although its underlying technique is to decode them to raw images, the compression artefacts are left intact) so you can try them out in your shaders. To generate ASTC images you have two options: the command line evaluation codec or the texture compression tool. Both of these tools have a lot of preset modes and switches for different use cases.

Things already mentioned like block or channel weighting can be set easily in either tool to clean up specific error cases. Also there are preset modes for normal maps, which map the angular weighted X and Y of the normal to Luminance and Alpha for better compression, and data masking, which tells the encoder to treat errors in different channels separately so that they can encode unrelated non colour data.

Both tools are also capable of encoding volumetric 3D textures. Either of them will accept an uncompressed 3D image file, and the command line tool has commands for accepting an array of 2D slices to build the 3D volume.

In my main auditorium talk at GDC I gave a few more tips on working with compressed textures, and I’ll share those in another blog real soon. For now, download a compressor and have fun playing around with the future of texture compression.

-Stacy

Mobile, Graphics, and Gaming blog

Unlock the power of SVE and SME with SIMD Loops

Vidya Praveen

SIMD Loops is an open-source project designed to help developers learn SVE and SME through hands-on experimentation. It offers a clear, practical pathway to mastering Arm’s most advanced SIMD technologies…
- September 19, 2025
What is Arm Performance Studio?

Jai Schrem

Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
- August 27, 2025
How Neural Super Sampling works: Architecture, training, and inference

Liam O'Neil

A deep dive into a practical, ML-powered approach to temporal super sampling.
- August 12, 2025

AI blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded and Microcontrollers blog

Internet of Things (IoT) blog

Laptops and Desktops blog

Mobile, Graphics, and Gaming blog

Operating Systems blog

Servers and Cloud Computing blog

SoC Design and Simulation blog

Tools, Software and IDEs blog

ASTC Does It - Part II: How To Use It

Bit Rates and Block Size

Limits and Leeway

Priority and Perception

Getting Started with ASTC

Unlock the power of SVE and SME with SIMD Loops

What is Arm Performance Studio?

How Neural Super Sampling works: Architecture, training, and inference