This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ASTC Evaluation Codec

Mali has just published an evaluation codec for the new ARM Adaptive Scalable Texture Compression (ASTC) standard.

For more information on ASTC, take a look at the ARM Multimedia Blog posts "ASTC Texture Compression: ARM Pushes the Envelope in Graphics Technology" and "ARM Unveils Details of ASTC Texture Compression at HPG Conference".

I have started this thread for users of this evaluation tool to ask questions. Here's a very quick "getting started" guide:

Getting Started

First, accept the license, download the tarball and unpack. In the subdirectories Win32, Mac OS X and Linux32 are binaries for, you guessed it, Windows, Mac OS X, and Linux (x86 versions). If you are running on another system, you might like to try compiling from source - take a look at Source/buildinstructions.txt .

Open a terminal, change to the appropriate directory for your system, and run the astcenc encoder program, like this on Linux or Mac OS:

./astcenc

Or like this on Windows:

astcenc

Invoking the tool with no arguments gives a very extensive help message, including usage instructions, and details of all the possible options.

How do I run the tool?

First, find a 24-bit .png or .tga file you wish to use, say /images/example.png (or on windows C:\images\example.png).

You can compress it using the -c option, like this (use the first line for Linux or Mac OS, second line for Windows users):

./astcenc -c /images/example.png /images/example-compressed.astc 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.astc 6x6 -medium

The -c indicates a compression operation, followed by the input and output filenames. The block footprint size follows, in this case 6x6 pixels, then the requested compression speed, medium.

To decompress the file again, you should use:

astcenc -d /images/example-compressed.astc /images/example-decompressed.tga
astcenc -d C:\images\example-compressed.astc C:\images\example-decompressed.tga

The -d indicates decompression, followed by the input and output filenames. The output file will be an uncompressed TGA image.

If you just want to test what compression and decompression are like, use the test mode:

astcenc -t /images/example.png /images/example-decompressed.tga 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.tga 6x6 -medium

This is equivalent to compressing and then immediately decompressing again, and it also prints out statistics about the fidelity of the resulting image, using the peak signal-to-noise ratio.

Take a look at the input and output images.

Experimenting

The block footprints go from 4x4 (8 bits per pixel) all the way up to 12x12 (0.89 bits/pixel). Like any lossy codec, such as JPEG there will come a point where selecting too aggressive a compression results in inacceptable quality loss, and ASTC is no exception. Finding this optimum balance between size and quality is one place where ASTC excels since its compression ratio is adjustable in much finer steps than other texture codecs.

The compression speed runs from -veryfast, through -fast, -medium and -thorough, up to -exhaustive. In general, the more time the encoder has to spend looking for good encodings, the better the results.

So, download, run, have a play, and post any questions or results on this thread.

Parents
  • That was the only use case I had in mind.

    I have a few other questions about the OpenGL ES spec that hopefully you can shed some light on. If this is better suited for a separate thread, I'd be happy to move the conversation.

    1. The weight infill section (C.2.18) covers the method by which the texel weights are converted from a low resolution texel grid into a full block resolution weight grid. I'm a little bit confused about how to calculate the neighboring weights for bilinear interpolation for this step. The outline is given as:

    v0 = js + jt*N;

    p00 = decode_weight(v0);

    p01 = decode_weight(v0 + 1);

    p10 = decode_weight(v0 + N);

    p11 = decode_weight(v0 + N + 1);

    For a 4x4 texel grid expanding to a 4x4 block grid, I assume each of the texels correspond one-to-one. However, if I follow this procedure, I get index out of bounds problems when trying to decode weights in the bottom right portion of the block grid. The corresponding blend values end up being zero, but in the decode step that's not clear until later. My question: is this a lazy procedure? In other words, if the blend values are zero, do we avoid decoding the weights? It seems like you'd run into issues in the bottom right borders of the texel weight grid regardless.


    2.  I'm also a bit confused about how to interpret the texel weight data. (C.2.16) The spec says "The weight information is stored as a stream of bits, growing downwards from the most significant bit in the block. Bit n in the stream is thus bit 127-n in the block." Does this mean that the weight data is read in reverse order from the rest of the block information? I.e. does the decompressor start reading from the most significant bit backwards to the least significant of the block data? How does this work with the little endian byte storage? I assume it's something like this:

    Memory ---->

    Bit |120 - 112|128 - 121

    ----|---------|

    -------------- ^ Start here reading this way ---->

    3. Finally, I'd like clarification on a small redundancy w.r.t partition selection. Section C.2.21 says that the partition selection algorithm takes as input a seed which is initialized as the block's partition index (bits 11-22 in table C.2.6). However, it also takes as input the number of partitions in the block (bits 11-12 in table C.2.4). This means that the number of partitions always one more than the value stored in the two least significant bits of the partition index, and the partition function could be simplified to:

    int new_select_partition(int seed, int x, int y, int z, int small_block) {

    return select_partition(seed, x, y, z, (seed & 0x3) + 1, small_block);

    }

    Is this correct?

    Thanks!

Reply
  • That was the only use case I had in mind.

    I have a few other questions about the OpenGL ES spec that hopefully you can shed some light on. If this is better suited for a separate thread, I'd be happy to move the conversation.

    1. The weight infill section (C.2.18) covers the method by which the texel weights are converted from a low resolution texel grid into a full block resolution weight grid. I'm a little bit confused about how to calculate the neighboring weights for bilinear interpolation for this step. The outline is given as:

    v0 = js + jt*N;

    p00 = decode_weight(v0);

    p01 = decode_weight(v0 + 1);

    p10 = decode_weight(v0 + N);

    p11 = decode_weight(v0 + N + 1);

    For a 4x4 texel grid expanding to a 4x4 block grid, I assume each of the texels correspond one-to-one. However, if I follow this procedure, I get index out of bounds problems when trying to decode weights in the bottom right portion of the block grid. The corresponding blend values end up being zero, but in the decode step that's not clear until later. My question: is this a lazy procedure? In other words, if the blend values are zero, do we avoid decoding the weights? It seems like you'd run into issues in the bottom right borders of the texel weight grid regardless.


    2.  I'm also a bit confused about how to interpret the texel weight data. (C.2.16) The spec says "The weight information is stored as a stream of bits, growing downwards from the most significant bit in the block. Bit n in the stream is thus bit 127-n in the block." Does this mean that the weight data is read in reverse order from the rest of the block information? I.e. does the decompressor start reading from the most significant bit backwards to the least significant of the block data? How does this work with the little endian byte storage? I assume it's something like this:

    Memory ---->

    Bit |120 - 112|128 - 121

    ----|---------|

    -------------- ^ Start here reading this way ---->

    3. Finally, I'd like clarification on a small redundancy w.r.t partition selection. Section C.2.21 says that the partition selection algorithm takes as input a seed which is initialized as the block's partition index (bits 11-22 in table C.2.6). However, it also takes as input the number of partitions in the block (bits 11-12 in table C.2.4). This means that the number of partitions always one more than the value stored in the two least significant bits of the partition index, and the partition function could be simplified to:

    int new_select_partition(int seed, int x, int y, int z, int small_block) {

    return select_partition(seed, x, y, z, (seed & 0x3) + 1, small_block);

    }

    Is this correct?

    Thanks!

Children
No data