In 2011 I attended the ARM Global Engineering Conference, where I saw a presentation about a new algorithm used in texture compression. I expected it to be about colour space conversions and perceptual filters, but the entire talk was about number encoding, and how non power of two number ranges could be stored more efficiently in trinary and quinary digits (trits and quints respectively) rather than binary bits.

Although the talk was very interesting I came away a little disappointed, feeling like I learned something very simple when I expected something complicated. But since then I’ve had to explain it to several people and realised it’s actually not as simple as it seemed, tomolson just explained it very well. For a while it still seemed kind of useless though.

Turning the clock forward again, I was recently tasked with writing up an introductory document for the new Adaptive Scalable Texture Compression (ASTC) algorithm. The document served two purposes, firstly as a user guide for the compressor on our developer resources site, and secondly as an explanation of how the compression algorithm itself works.

I think it’s important for people to know how ASTC works because only then can they confirm that it actually does work. If you instead just say “This is ASTC, it can compress an RGBA image to less than one bit per pixel” most people will just instinctively call shenanigans at the very thought of it. It barely seems possible to get it down to 1 bit per pixel, when you start quoting figures of 0.89 people’s brains go out of the window. It’s lossy compression, so you don’t have to burn the entire history of mathematics and data representation just yet, but the popular consensus is that it’s still better than pretty much anything else out there.

Low bit rates in texture compression are achieved by compressing pixels in blocks. Typically the blocks are completely self contained, so you don’t need any other data to decompress it; no lookup tables or dictionaries. Taking ETC as an example, 4 bits per pixel are achieved by compressing a 4x4 block into a 64 bit word using a couple of low bit rate base colours and even lower bit rate per texel offsets. This known data rate means a texel coordinate can quickly be converted into a block position, and therefore into a data offset to decompress. The 4 bits per pixel is fixed for ETC though, as is the maximum quality of the algorithm.

ASTC, by comparison, allows you to trade off quality against bit rate. As the name suggests, it is also scalable, so you can increase the bit rate to improve quality, anywhere from 8 bits to 0.86 bits per pixel. Actually it can go as low as 0.59 if you’re using 3D textures, but let’s not get ahead of ourselves.

The way this is achieved is by using variable block sizes. Not variable in a data size, every block is 128 bits regardless of the bit rate. The footprint of the blocks varies from a 4x4 block to a 12x12 block (with a couple of fruity rectangles in the mix too, giving you even more bit-rate choice) so the bit rate is 128 bits divided by the number of pixels in the block. 16 pixels in 128 bits is 8 bits per pixel, and 144 pixels in 128 bits is 0.89 bits per pixel. ASTC is quite unique in that it can also compress textures in 3D blocks, which go from 3x3x3 to 6x6x6. That’s 216 values, in 128 bits, or 0.59 bits per pixel.

I guess I’ve not really explained the ‘how’ part yet though. Every block has sets of endpoint colours. So if you had a block that entirely consisted of different shades of blue, your end points would be a light and a dark blue. If the block was something more interesting , like flames, the end points would be more like yellow and red or, if it’s a small block, a yellowish orange and a reddish orange; after all there’s no point in the range containing colours that aren’t in the block. Then the individual samples are an offset between these two values, so the block can have all kinds of different gradient patterns. Since you’re probably aiming for less than one bit per pixel, you obviously can’t sample every pixel exactly, so it samples intermittently and interpolates between them.

That probably sounds like it might lead to blurry looking images with blocky colour leaking, but that’s only half the story. Because the block can have up to 4 different colour gradients, each with a start and end point. There’s a single set of samples for where in the gradient the texel lies, but which gradient a specific texel is taken from comes from a second layer, an index pattern which tells it per texel which gradient to pick from. The great thing is, this set of indices doesn’t have any interpolation so it gives nice crisp edges on contrasting colours between the soft parts of the gradients.

The obvious follow on question is how do you fit any kind of per texel value into a texture if you don’t even have a bit? The answer is hash patterns. I’m going to drop my favourite picture here.

ASTC generates these little colour partition blocks from a 10 bit seed. You’ll notice there’s a mix of two-partition, three-partition and four-partition blocks. There’s also a one partition block but it’s not as interesting.

The multiple partition count is quite important; the fewer partitions the texture has, the more data is available to fine tune the end points and the gradient values in between. So while more partitions might give crisper details, the softer parts between those details will suffer slightly.

All this talk of tradeoffs are actually a strong hint of how compression works. Decompression happens in a single fast pass, often in the graphics hardware itself so you don’t even notice it. The gradient offset is pulled from the sample map, the hash is generated to look up the gradient end points and the final colour is calculated from them. Compression, on the other hand has to take a block and figure out how many partitions to use, which hash pattern to pick for them, what the end point colours should be for each one and, for those awkward times when the partitions don’t match the colours perfectly, where exactly does a blue pixel lay in a gradient between red and yellow?

This is why compression takes so long. After all, the most important thing is how quickly a texture decompresses during the rendering. Compression actually leverages the speed of decompression to its advantage, as the algorithm tests out partitions, picks end points from the colours in the block and fills the samples with the a value calculated in comparison to the corresponding gradient, and then decompresses the result into a block of pixels. It then uses error metrics to give a single value of how wrong this attempt was. It does this lots and lots of times, always holding onto the least wrong result.

The block is considered ‘done’ either after a certain number of attempts or when the error value falls below a given threshold. I like to imagine that the algorithm either gets fed up and takes the best of a bad bunch or finds a decent one and says “close enough”.

The compression time can be varied by telling it how close is close enough and how many attempts to try before it gives up. Some images may actually come out pretty good with a lower compression time, as the early attempts fit pretty well, others may look awful with the first few attempts and need longer. The user control of these factors means you get to make the call if it needs it.

Back the 128 bits, I know that some of the readers probably still don’t quite believe me. You know that 10 bits are used for the hash pattern, and in the worst case you need 4 sets of colour end points. These are RGBA, so 4 values each for 8 colours, plus a grid of 16 interpolated samples. That means you only have 2.45 bits for each value. But you can’t have 2.45 bits, you can only have 2 bits, so that means you have to quantise it oddly so that some get 3 bits and some get 2, otherwise you’re wasting 22 whole bits, and you can’t just waste bits if you’re going for a low bit rate. Then the algorithm needs to know which 22 values are 3 bits and which 26 are 2 bits. Any way you slice it, the math ends up strange. Since the number of partitions changes the number of end points, it would also need different known data sizes for different types and different weights and it all gets horribly complicated.

Except you don’t have to do that, because in 2011 I attended a talk at the ARM General Engineering conference, where Tom Olson explained a new technique in texture compression. By representing values in trinary and quinary numbers they can pack more densely. If you’re working in base 5, your digit runs from 0 to 4. The first quint is 1s, the second quint is 5s, and the third quint is 25s. In these three digits you have a numerical range from 0 to 124, and 124 can be represented in 7 bits.

If you wanted to hold a value from 0-4 in binary, you’d need 3 bits. Therefore those 3 values, would take 9 bits. But 9 bits can hold a value from 0 to 511, not to mention the fact that it’s really hard to move 9 bits around, so you’d need to really use 16 bits. 16 bits can hold 5 3 bit blocks, with one unused bit at the end. Use that space for quints and it’ll hold 6 of them, with 2 unused bits at the end. Similar results are seen with trits.

It’s not just about the uniform representation of non power of two value ranges that makes this so interesting. The choice of base 3 and base 5 are quite clever from numerical reasons. If you normalise the value and make it a representation of a gradient between 0 and 1, the values in a quint represent ( 0, ¼, ½, ¾, 1 ) and a trit will represent ( 0, ½, 1 ), so it even makes the blending faster by using power of two divisors.

When I mentioned blocks of 3 bits, some of you might have said, “Yes, but those hold 8 values”. What eight normalized values would they have held? They would have held sevenths. Have you ever tried to get a computer to work with sevenths?

I didn’t think so.

If you’d like to know more about how all this fits together and try out ASTC for yourself, there are a number of documents available on the Mali developer website alongside the evaluation codec encoder, and you can also find our OpenGL® ES emulator which supports ASTC.

If you’d like to talk about any of this, my colleagues and I regularly attend game development events where we’re not hard to find. Keep an eye on the ARMMultimedia twitter feed to see what events we’re attending next. Alternatively, start the conversation in the comments section below.