ARM Unveils Details of ASTC Texture Compression at HPG Conference - Part 1

September 11, 2013

8 minute read time.

The internal details of ARM's Adaptive Scalable Texture Compression (ASTC) technology were launched this week at the High Performance Graphics conference in Paris, France.

Tom Olson presented his paper entitled "Adaptive Scalable Texture Compression", as part of the session on texture and appearance on Wednesday 27 June 2012.

This is the first of two blog posts giving an overview of the ASTC technology as presented in the paper.

Features and Quality

As a quick recap, ASTC offers a number of advantages over existing texture compression schemes:

It is flexible, allowing bit rates from 8 bits per pixel (bpp) down to less than 1 bpp. This allows content developers to fine-tune the tradeoff of space against quality.
It supports from 1 to 4 color channels, together with modes for uncorrelated channels for use in mask textures and normal maps.
It supports both low dynamic range (LDR) and high dynamic range (HDR) images.
It supports both 2D and 3D images.
All of these features are interoperable. You can choose any combination that suits your needs.

Despite all this flexibility, quality is universally better than existing texture compression schemes for LDR images, and is comparable to the de facto industry standard for HDR.

How Does It Work?

ASTC, like all current texture compression schemes, divides the image into fixed-size blocks. These blocks cover a fixed-size "footprint" in the texture image, and are encoded using a fixed number of bits. This feature makes it possible to access texels quickly in any order, with a well-bounded cost for that access.

This is in contrast to stream-based, variable-bitrate image formats such as PNG, where the decoding process requires that you have decoded the previous texels in the image. Obviously, this would be a problem if the texels you wish to access are at the bottom right of the texture.

The 2D block footprints in ASTC range from 4x4 texels up to 12x12. By dividing the 128 bits by the number of texels in the footprint, we derive bit rates from 8 bpp (128 bits / 16 texels) down to 0.89 bpp (128 bits / 144 texels).

Original example image

Detail of ASTC compressed image, at 8bpp, 3.56bpp and 2bpp

In the simplest case, the encoder analyses each block in isolation and selects two colors which define the end points of a line in the color space. The approximate colors of texels can then be reconstructed from these color endpoints by interpolating between them. For each texel in the footprint, a weight value is stored, and the weighted average calculated. The weight, mathematically, is a value in the range 0 to 1, but for storage this is quantized to a few bits. Selecting the endpoint colors and the weights to make an optimal match to the texel colors in the original block is the job of the encoder.

Most of the existing formats use similar methods, and it is possible to trace the origins of this technique as far back as 1979. However, most schemes use a fixed split between the number of bits used to represent the endpoint colors, and the number of bits used to represent the color weights.Some formats, offer different precision at different bit rates, but the number of bits for endpoints and weights is determined globally by the block footprint.

Tom's previous blog post on ASTC goes into some detail about the constraints of each of the existing texture compression methods.

Trading Spaces

The "Adaptive" part of ASTC allows the encoder to tune the number of bits assigned to each piece of data, on a block-by-block basis. There are sixteen different color endpoint modes, any of which can be chosen for any block in the image. If a block is largely gray, then it is possible to encode the color endpoints more efficiently and devote the resulting extra bits to representing the texel weights more accurately.

But here, apparently, is a problem. If we have a 4x4 texel footprint, and we want to increase the amount of data in each weight, then it would seem that the minimum increment would be one extra bit per texel. The resulting shift in the balance between weights and endpoints would thus be 16 bits, which is rather crude. For larger block footprints, the problem gets worse. For finer control, we need a way to add less than one bit per texel.

Bounded Integer Sequence Encoding

Initially, fractional bits per pixel sounds implausible, or even impossible, but it's not quite as strange as it initially sounds.

In principle we can choose our quantization of the texel weights (and the color components of the endpoint colors) to use any number of values. For the sake of illustration, let us assume that we can best represent each texel in a particular block using one of five weight values - 0.0, 0.25, 0.5, 0.75 and 1.0. We can easily quantize these to the integer values 0..4. Two bits is insufficient to represent these, as that would only represent four values, so conventionally we would need to allocate 3 bits. We would then either expand the quantization to use all eight possible 3-bit values, or leave three of the values unused.

However, a combination of any three of these texels has one of 5³ possible values, or 125. This is very close to the number of values that it is possible to encode in 7 bits (2⁷= 128). So if we can group the texels into triplets, and find an appropriate encoding scheme for these base-5 values ("quints"), we can use just 7 bits, instead of the 9 we would need for storing three bits per value. This is a significant saving, and has the somewhat weird property of assigning a non-integer number of bits - 2.33 - to each value.Similar reasoning shows that it is possible to pack base-3 values (trits) in groups of five, each group taking 8 bits (3⁵ = 243, 2⁸= 256), for 1.6 bits per value.The Bounded Integer Sequence Encoding (BISE) technique used in ASTC always quantizes values to ranges which conform to one of three patterns: values from 0 up to 2ⁿ-1, using n bits; up to 3 x 2ⁿ-1, using n bits and a trit; or up to 5 x 2ⁿ-1 using n bits and a quint. This allows us to encode any ideal quantization range with much less waste than the traditional whole-number-of-bits approach.

When the number of values is not a multiple of three or five, we need to avoid wastage at the end of the sequence. Thus, we have another constraint on the chosen encoding. If the last few values in the sequence to encode are zero, the last few bits in the encoded bit string must also be zero. Ideally, the number of non-zero bits should be easily calculated and not depend on the magnitudes of the previous encoded values.This is a little tricky to arrange, but it is possible. This means that we do not need to store any padding after the end of the bit sequence, as we can happily assume that they are zero bits, safe in the knowledge that they will not affect the decoding of the actual values.

With this constraint in place, and by interleaving the bits, trits and quints appropriately, BISE encodes a sequence of length S (i.e.an array of S integer values) using a fixed number of bits:

For S values in the range 0 up to 2ⁿ-1, it uses nS bits.
For S values in the range 0 up to 3 x 2ⁿ-1, it uses nS + ceiling(8S/5) bits.
For S values in the range 0 up to 5 x 2ⁿ-1, it uses nS + ceiling(7S/3) bits.

This is the key innovation that allows ASTC to efficiently trade space between color endpoint values and texel weights.

Stepping Daintily Along Quality Street

The other thing that this gives us the ability to change the number of values represented in relatively small increments. With other texture compression techniques, if the artists are not happy with the quality of their image at one bit rate, their only options are to double its bit rate, or to try another compression scheme entirely. Each of these has its own problems.

Paying a 100% penalty for storage, transmission and memory bandwidth on a texture is a serious proposition, especially if the result at the lower bit rate is only just below the acceptable quality threshold.

Moving to a different compression scheme is not always possible, as the deployment platform may not support the scheme that the designer wants to use. Even when it does, requalifying the image takes time and effort, as the kinds of errors introduced will be different.

ASTC, by exploiting the fractional bits offered by BISE, allows the footprint to vary more smoothly. The available bit rates range from 0.89 bpp up to 8bpp in fine steps, with no step requiring more than 25% more memory than the previous one. Thus, the content developer has much more control over the space-to-quality trade-off than with other formats.

This means that the ASTC encoder and decoder have to work with non-power-of-two block footprint sizes, but the power of base-3 and base-5 arithmetic comes into play again, and this is used to implement fast division and modulo operations. This makes address calculations for a 6x8 block footprint just as fast as for 8x8.

To Be Continued...

In ARM Unveils Details of ASTC Texture Compression at HPG Conference - Part 2 we will see how Bounded Integer Sequence Encoding, working in tandem with other features, gives rise to the extraordinary flexibility of ASTC.

Meng over 5 years ago

are there some examples for ABFC and ASTC on ARM platform？Thanks in advance
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Graphics, Gaming, and VR blog

The mobile gaming revolution, powered by Arm

Philippe Bressy

This blog post describes the stratospheric growth of mobile gaming growth from the late 90s to present day, and how Arm technology has been at the heart of the mobile gaming revolution.
- November 18, 2024
Shader analysis and more in Arm Performance Studio 2024.4

Julie Gaskin

Learn about the new shader analysis features for mobile developers in Frame Advisor, and hear about other Arm Performance Studio changes in this release.
- October 2, 2024
Save your battery while enjoying the modern graphics on mobile with Android Dynamic Performance Framework

Patrick Wang

Save battery and enhance mobile gaming with ADPF and Unreal Engine. Mori shows you how it optimizes graphics based on real-time thermal data, reducing overheating and power consumption.
- September 26, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog