ARM's ASTC has been adopted by Khronos as the new industry-standard texture compression scheme. As I mentioned in a previous blog, ASTC supports both High Dynamic Range and three-dimensional texture compression, and I thought I would take some time to show how these features work and what cool effects they can be used for in content.
Normal (Low Dynamic Range or LDR) colour images represent the brightness of colour on the screen as a value in the relatively low dynamic range between 0 (minimum brightness) and 1 (maximum brightness). There's no point representing values outside this range because you can't get darker than black, and you can't make your monitor emit more light than its maximum brightness.
However, during calculations that determine the final colour of a pixel on the screen, it's quite often the case that you want to represent brightness outside this range. For example, a light shining on a surface might create a contribution of twice the maximum brightness, before being modulated downward again by the relatively dark colour of the surface. If you can only represent values in the range 0 to 1, then the surface can never get brighter than its base colour, no matter how powerful the light you shine on it. For simple lighting, this is perfectly easy to do, as you can assign any old brightness to your light during the calculation. More advanced lighting techniques often use textures to supply the light intensity and colour, and if we only have low dynamic range textures then this can be a problem. Just scaling the texture values works for the bright colours but, due to the linear interval between brightness steps, it produces nasty banding effects in the darker areas, which look bad. Similarly, for textures representing scenes with light and shadow, or scenes containing light sources, with LDR you have the problem of either favouring the bright areas and losing definition in the dark regions, or keeping the accuracy of the shadows and washing out the bright areas.
What we need is a representation which allows a larger dynamic range, but which also represents colours using smaller steps in the darker regions. Enter the High Dynamic Range (HDR) texture.
In OpenGL ES, it is possible to specify HDR textures using 16-bit floating point values, but for a typical RGB texture, this equates to 48 bits per pixel. Given that most designers are unhappy with the space taken by typical textures with only 24 bits per pixel, this 2x space penalty is a big disadvantage. Clearly, we need compression.
As you may recall, ASTC can use different encoding modes for each block of texels in an image, ensuring that the most efficient mode is chosen depending on the content of that block. In previous blogs we talked about the LDR encoding schemes, but ASTC also includes a number of HDR encoding modes which modify the way that colours are stored and interpolated. If a block contains a set of pixel values which include suitable distributions of both dark and light texels, or texels which have values outside the range 0..1, then the colours are stored in a pseudo-logarithmic 12-bit representation which can easily be converted to a true floating-point form in the decoder. The beauty of this representation is that the same linear interpolation we use for LDR values can be applied to the logarithmic values and this results in exactly the kind of value spacing that we need to represent both the bright and dark values with minimal visual distortion. The additional steps needed are, during encoding, to change the incoming floating-point texel values into the pseudo-logarithmic value, and then, during decoding, to change it back to floating point at the end. The encoding is chosen so that these functions are quite simple. Once converted, ASTC treats these values in the same way as the LDR values, which conserves silicon area and allows all of the flexibility of choice that is available for the LDR modes.
The texture creator can decide whether he wants HDR or LDR with the same access to colour formats and bit rates in both cases. Of course, in many cases an HDR image will contain more information than an LDR one, so a higher bit rate will be appropriate. Our experiments with HDR images show that ASTC does approximately as well as BC6H, the current de facto HDR compression standard, when compared at the same bit rate of 8 bits per pixel. However, there are classes of HDR content (light maps being one of them) which can benefit from a low bit rate, and it makes sense to enable this where it's sensible.
The OpenGL, OpenGL ES and DirectX APIs have supported 3D textures for some time, but they don't seem to get much love. It's not hard to see why - they are slightly colossal. A 256x256 RGB texture is on the small side these days, and occupies 192KiB uncompressed. To get the same pixel resolution in 3D would require 256x256x256 texels, occupying a worryingly large 48MiB. For most applications, this is what 3D professionals call far too big. Clearly, compression is required not to support 3D textures, but to enable them.
Up until now, compression schemes have half-heartedly supported 3D by simply compressing 2D slices. In our 256x256x256 case, we simply treat it as 256 separate slices, each of which is 256x256 pixels in size, compress each slice separately, and stick them in an array. There are two major disadvantages to this: one is performance, and one is quality.
The performance problem manifests itself depending on how you look at your 3D texture. If you are displaying a surface which is parallel to the direction in which you encoded the slices, then one texel decode operation only requires access to blocks of data within a single slice. You're also quite likely to find nearby pixels in the same block, or in the cache.
But if your surface is mapped in a perpendicular direction, you will need to access many more texture blocks, impacting bandwidth and thrashing the texture cache. This leads to a large performance dependency, not on something controllable like the complexity or structure of your content, but instead on the unpredictable angle of the viewpoint. Nasty.
The second problem is that, in 3D textures, we are usually representing real volume data, where features in the texture really are three-dimensional. The values in the texture will therefore correlate across slices, and that's something that a compression algorithm can exploit. If you separate the slices, you handicap the compression and this results in lower quality for a given bit rate.
The 3D modes in ASTC introduce a new idea - each block of 128 bits of compressed data now covers a 3D footprint, from 3x3x3 pixels to 6x6x6 with the usual fine steps in between. This corresponds to bit rates from 4.7 down to 0.56 bits per pixel. Since the blocks are cubic (or near cubic, as sizes like 4x4x5 are also allowed), they look similar no matter which direction you view them from. This almost completely eliminates the drastic variability of performance that you see with the sliced approach.
We have performed experiments encoding 3D volume textures using ASTC at 2 bits per pixel. This corresponds to slices with 8x8 pixel blocks, or 3D native encoding using a 4x4x4 pixel block. The 3D encoding gives a native quality improvement of around 2dB, showing that the additional correlation between layers can be exploited by the compression algorithm to produce a significant increase in quality.
For 3D textures with sharply defined regions of different colour, we also see an advantage of the way we define the texture partitions. By implementing this as a function instead of a table, it is easy (once you have realised that it is possible) to define the function in terms of not just X and Y, but X, Y and Z. This gives us a 3D partition table for absolutely minimal additional hardware cost, instead of requiring a huge ROM.
3D blocks do require a modification to the arrangement of the pixel weight arrays, but this fits neatly into the 128-bit block structure and reuses a lot of the functional units from the 2D case. For example, one other clever trick is to interpolate between weights using simplex interpolation rather than trilinear interpolation to calculate the effective weights. This reduces the amount of work involved to the same as a 2D bilinear interpolation, and we can reuse the same interpolation hardware, saving power and area.
And to head off the obvious question: yes, this is another independent choice, so ASTC also supports 3D HDR textures! Of course, what else did you expect?
The ASTC evaluation codec is available at Arm Developer and supports LDR, HDR and 3D texture compression.
Do you have any novel applications of HDR or 3D textures? Why not share them with us in the comments.