This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ASTC Evaluation Codec

Mali has just published an evaluation codec for the new ARM Adaptive Scalable Texture Compression (ASTC) standard.

For more information on ASTC, take a look at the ARM Multimedia Blog posts "ASTC Texture Compression: ARM Pushes the Envelope in Graphics Technology" and "ARM Unveils Details of ASTC Texture Compression at HPG Conference".

I have started this thread for users of this evaluation tool to ask questions. Here's a very quick "getting started" guide:

Getting Started

First, accept the license, download the tarball and unpack. In the subdirectories Win32, Mac OS X and Linux32 are binaries for, you guessed it, Windows, Mac OS X, and Linux (x86 versions). If you are running on another system, you might like to try compiling from source - take a look at Source/buildinstructions.txt .

Open a terminal, change to the appropriate directory for your system, and run the astcenc encoder program, like this on Linux or Mac OS:

./astcenc

Or like this on Windows:

astcenc

Invoking the tool with no arguments gives a very extensive help message, including usage instructions, and details of all the possible options.

How do I run the tool?

First, find a 24-bit .png or .tga file you wish to use, say /images/example.png (or on windows C:\images\example.png).

You can compress it using the -c option, like this (use the first line for Linux or Mac OS, second line for Windows users):

./astcenc -c /images/example.png /images/example-compressed.astc 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.astc 6x6 -medium

The -c indicates a compression operation, followed by the input and output filenames. The block footprint size follows, in this case 6x6 pixels, then the requested compression speed, medium.

To decompress the file again, you should use:

astcenc -d /images/example-compressed.astc /images/example-decompressed.tga
astcenc -d C:\images\example-compressed.astc C:\images\example-decompressed.tga

The -d indicates decompression, followed by the input and output filenames. The output file will be an uncompressed TGA image.

If you just want to test what compression and decompression are like, use the test mode:

astcenc -t /images/example.png /images/example-decompressed.tga 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.tga 6x6 -medium

This is equivalent to compressing and then immediately decompressing again, and it also prints out statistics about the fidelity of the resulting image, using the peak signal-to-noise ratio.

Take a look at the input and output images.

Experimenting

The block footprints go from 4x4 (8 bits per pixel) all the way up to 12x12 (0.89 bits/pixel). Like any lossy codec, such as JPEG there will come a point where selecting too aggressive a compression results in inacceptable quality loss, and ASTC is no exception. Finding this optimum balance between size and quality is one place where ASTC excels since its compression ratio is adjustable in much finer steps than other texture codecs.

The compression speed runs from -veryfast, through -fast, -medium and -thorough, up to -exhaustive. In general, the more time the encoder has to spend looking for good encodings, the better the results.

So, download, run, have a play, and post any questions or results on this thread.

  • Devendran,

    You are right that this is strictly not necessary, as the bit-transfer procedure does indeed guarantee that the unsigned value b is in the correct range. You do need to clamp e1, because the previous addition operation may overflow. I suspect that the description is down to the way the hardware works - the LDR endpoints which clamp their output share a common unit which clamps both values, and it is more expensive to special-case this decoding mode than it is to allow half of the unit to operate as a "no-op".

    Sean.

  • Mokosha,

    I am not aware of any plans to output compressed data to .ktx or .dds. Do you have a specific use case in mind? Reusing a generic .ktx loader in your application would be attractive in itself, of course.

    Sean.

  • Hi Sean Ellis

    I am writing a webgl based html application that uses astc compressed textures to be loaded on my triangle. I would like to know that does there exists a way to know whether the internal format of the compressed astc image(that in my case might be located on a remote web server) is linear or srgb encoded, by parsing the astc header. I can then use that internalFormat information obtained to pass my astc texture to glCompressedTexImage2D(). In other words, for eg. I want to know whether my internal format is "COMPRESSED_RGBA_ASTC_4x4_KHR" or "COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR" from the header of any astc compressed image. Any clues?

    Thank you

    Regards,

    Deepak

  • That was the only use case I had in mind.

    I have a few other questions about the OpenGL ES spec that hopefully you can shed some light on. If this is better suited for a separate thread, I'd be happy to move the conversation.

    1. The weight infill section (C.2.18) covers the method by which the texel weights are converted from a low resolution texel grid into a full block resolution weight grid. I'm a little bit confused about how to calculate the neighboring weights for bilinear interpolation for this step. The outline is given as:

    v0 = js + jt*N;

    p00 = decode_weight(v0);

    p01 = decode_weight(v0 + 1);

    p10 = decode_weight(v0 + N);

    p11 = decode_weight(v0 + N + 1);

    For a 4x4 texel grid expanding to a 4x4 block grid, I assume each of the texels correspond one-to-one. However, if I follow this procedure, I get index out of bounds problems when trying to decode weights in the bottom right portion of the block grid. The corresponding blend values end up being zero, but in the decode step that's not clear until later. My question: is this a lazy procedure? In other words, if the blend values are zero, do we avoid decoding the weights? It seems like you'd run into issues in the bottom right borders of the texel weight grid regardless.


    2.  I'm also a bit confused about how to interpret the texel weight data. (C.2.16) The spec says "The weight information is stored as a stream of bits, growing downwards from the most significant bit in the block. Bit n in the stream is thus bit 127-n in the block." Does this mean that the weight data is read in reverse order from the rest of the block information? I.e. does the decompressor start reading from the most significant bit backwards to the least significant of the block data? How does this work with the little endian byte storage? I assume it's something like this:

    Memory ---->

    Bit |120 - 112|128 - 121

    ----|---------|

    -------------- ^ Start here reading this way ---->

    3. Finally, I'd like clarification on a small redundancy w.r.t partition selection. Section C.2.21 says that the partition selection algorithm takes as input a seed which is initialized as the block's partition index (bits 11-22 in table C.2.6). However, it also takes as input the number of partitions in the block (bits 11-12 in table C.2.4). This means that the number of partitions always one more than the value stored in the two least significant bits of the partition index, and the partition function could be simplified to:

    int new_select_partition(int seed, int x, int y, int z, int small_block) {

    return select_partition(seed, x, y, z, (seed & 0x3) + 1, small_block);

    }

    Is this correct?

    Thanks!

  • Hi Sean,

    I was doing basic testing with ASTC encoder, I find for the below config I am fing visiable artificats:

    Encode setting are : medium with upto 4 possible partitions and 1024 partition indices.

    other tools such as 1/2 plane and refinement iteration 2 times are also enabled.

    Visible patched at the boundaries of the objects.  Is it expected?

    Note: I could not up load the image, due system issue.

    What kind of behaviors/artifacts are expected for 4x4 & 8x8 configuration with best & medium quality configurations.

    Thanks,

    Devendran Mani.

  • ASTC is a lossy compression - if you reduce the bitrate you will expect to get more artefacts, in particular around edges which often have high-frequency components.

  • Hi Peter,

    Thanks for the info.

    One more question related to a ASTC encoder issue.

    The ARM texture compression tool v4.2.0 & v4.1.0 give diffrent PSNR for a test image as below.

    V4.2.1 = 62dB & V4.2.0 = ~50dB

    ARM_TcToolv4.1.0.png

    Version : 4.1.0 - above image is from.

    ARM_TcToolv4.2.0.png

    Above picture V 4.2.0 - above image.

    We observed some line artifacts in V 4.2.0.   after analysis we find that 0xFF00 mask is applied in the void extend block at end of decoding.

    if we remove the 0xFF00 mask then the PSNR performance matches and the line artifacts are not seen.

    Please share your views on this issue.

    Thanks,

    Devendran Mani.

  • Hi Pete,

    Kindly clarify the below:

    We find that the images are flipped(in Y direction) before encoding and then encoded.

    Please let me know why the image is flipped? - is flipping function needed in encoder?

    Thanks,

    Devendran Mani.

  • Hi Pete,

    Kindly clarify the below:

    We find that the images are flipped(in Y direction) before encoding and then encoded.

    Please let me know why the image is flipped? - is flipping function needed in encoder?

    Thanks,

    Devendran Mani.

  • Like most things in graphics much depends on where you think your origin is. In OpenGL ES the texture origin is in the bottom left (in Direct 3D and many related texture encoding formats it is in the top left).

  • Hi Devmani,

    These 2 releases probably ship with different builds of the astc encoder, hence the difference in output. Thanks for flagging this,

    Chris

  • Hiseanellis

    I've played with astcenc and found that recompiling it for x64 target or for x86 target with SSE/AVX support gives up to 2x speedup for thorough and exhaustive modes (I've used vs2012).

    But astcenc.exe is an 32bit binary even in 64bit distribution for Windows (Mali_Texture_Compression_Tool_v4.2.0.445f5f1_Windows_x64.exe). So it seems we can get 1.5x-2x speedup for free.

    Could you please recompile astcenc.exe and update the MaliTCT installer?

    And also could you please update ASTC Evaluation codec source code, because astcenc.exe in MaliTCT is slightly differs?

  • hi kirpich30000,

    Thanks for highlighting this, I think for now you should be able to replace the astcenc binary shipped with TCT with one of your faster builds, and it will use it.

    Thanks,

    Chris

  • Yep, I've already done it. Here is some numbers for Intel Core i5 660:

    astcenc_tct_v42.exe -ts turret_diffuse_map.png out.astc 4x4 -thorough -time -showpsnr -silentmode

    PSNR (LDR-RGBA): 47.471549 dB

    Alpha-Weighted PSNR: 47.471548 dB

    PSNR (LDR-RGB): 46.222219 dB

    Elapsed time: 28.51 seconds, of which coding time: 28.46 seconds

    astcenc_eval_x64.exe -ts turret_diffuse_map.png out.astc 4x4 -thorough -time -showpsnr -silentmode

    PSNR (LDR-RGBA): 47.471834 dB

    Alpha-Weighted PSNR: 47.471834 dB

    PSNR (LDR-RGB): 46.222503 dB

    Elapsed time: 18.58 seconds, of which coding time: 18.52 seconds

  • Hi, Sean Ellis, chrisvarns

    I've found small bug in the ASTC Evaluation Codec. In astc_find_best_partitioning.cpp lines 860-862 should be reordered

    from:

    860best_partition = ((best_partition >> PARTITION_BITS) << PARTITION_BITS) | partition_sequence[best_partition & (PARTITION_COUNT - 1)];
    861best_partitions_dual_weight_planes[i] = best_partition;
    862

    separate_errors[best_partition] = 1e30f;

    to:

    860separate_errors[best_partition] = 1e30f;
    861best_partitions_dual_weight_planes[i] = best_partition;
    862best_partition = ((best_partition >> PARTITION_BITS) << PARTITION_BITS) | partition_sequence[best_partition & (PARTITION_COUNT - 1)];

    Original code causes "invalidation" of a wrong partition candidate. Thereof in most cases partition_indices_2planes[0] and partition_indices_2planes[1] would be the same (because on the second iteration separate_errors[1st_best_partition] would still be the best). So effectively only one 2planes partition set would be tested.

    This change provides about 0.01db quality improvement