This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ASTC Evaluation Codec

Mali has just published an evaluation codec for the new ARM Adaptive Scalable Texture Compression (ASTC) standard.

For more information on ASTC, take a look at the ARM Multimedia Blog posts "ASTC Texture Compression: ARM Pushes the Envelope in Graphics Technology" and "ARM Unveils Details of ASTC Texture Compression at HPG Conference".

I have started this thread for users of this evaluation tool to ask questions. Here's a very quick "getting started" guide:

Getting Started

First, accept the license, download the tarball and unpack. In the subdirectories Win32, Mac OS X and Linux32 are binaries for, you guessed it, Windows, Mac OS X, and Linux (x86 versions). If you are running on another system, you might like to try compiling from source - take a look at Source/buildinstructions.txt .

Open a terminal, change to the appropriate directory for your system, and run the astcenc encoder program, like this on Linux or Mac OS:

./astcenc

Or like this on Windows:

astcenc

Invoking the tool with no arguments gives a very extensive help message, including usage instructions, and details of all the possible options.

How do I run the tool?

First, find a 24-bit .png or .tga file you wish to use, say /images/example.png (or on windows C:\images\example.png).

You can compress it using the -c option, like this (use the first line for Linux or Mac OS, second line for Windows users):

./astcenc -c /images/example.png /images/example-compressed.astc 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.astc 6x6 -medium

The -c indicates a compression operation, followed by the input and output filenames. The block footprint size follows, in this case 6x6 pixels, then the requested compression speed, medium.

To decompress the file again, you should use:

astcenc -d /images/example-compressed.astc /images/example-decompressed.tga
astcenc -d C:\images\example-compressed.astc C:\images\example-decompressed.tga

The -d indicates decompression, followed by the input and output filenames. The output file will be an uncompressed TGA image.

If you just want to test what compression and decompression are like, use the test mode:

astcenc -t /images/example.png /images/example-decompressed.tga 6x6 -medium
astcenc -c C:\images\example.png C:\images\example-compressed.tga 6x6 -medium

This is equivalent to compressing and then immediately decompressing again, and it also prints out statistics about the fidelity of the resulting image, using the peak signal-to-noise ratio.

Take a look at the input and output images.

Experimenting

The block footprints go from 4x4 (8 bits per pixel) all the way up to 12x12 (0.89 bits/pixel). Like any lossy codec, such as JPEG there will come a point where selecting too aggressive a compression results in inacceptable quality loss, and ASTC is no exception. Finding this optimum balance between size and quality is one place where ASTC excels since its compression ratio is adjustable in much finer steps than other texture codecs.

The compression speed runs from -veryfast, through -fast, -medium and -thorough, up to -exhaustive. In general, the more time the encoder has to spend looking for good encodings, the better the results.

So, download, run, have a play, and post any questions or results on this thread.

Sean Ellis over 10 years ago in reply to Mani

Devendran,
You are right that this is strictly not necessary, as the bit-transfer procedure does indeed guarantee that the unsigned value b is in the correct range. You do need to clamp e1, because the previous addition operation may overflow. I suspect that the description is down to the way the hardware works - the LDR endpoints which clamp their output share a common unit which clamps both values, and it is more expensive to special-case this decoding mode than it is to allow half of the unit to operate as a "no-op".
Sean.
Cancel
Up 0 Down

Cancel
Sean Ellis over 10 years ago in reply to Pavel Krajcevski

Mokosha,
I am not aware of any plans to output compressed data to .ktx or .dds. Do you have a specific use case in mind? Reusing a generic .ktx loader in your application would be attractive in itself, of course.
Sean.
Cancel
Up 0 Down

Cancel
Deepak over 10 years ago
Hi Sean Ellis
I am writing a webgl based html application that uses astc compressed textures to be loaded on my triangle. I would like to know that does there exists a way to know whether the internal format of the compressed astc image(that in my case might be located on a remote web server) is linear or srgb encoded, by parsing the astc header. I can then use that internalFormat information obtained to pass my astc texture to glCompressedTexImage2D(). In other words, for eg. I want to know whether my internal format is "COMPRESSED_RGBA_ASTC_4x4_KHR" or "COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR" from the header of any astc compressed image. Any clues?
Thank you
Regards,
Deepak
Cancel
Up 0 Down

Cancel
Pavel Krajcevski over 10 years ago in reply to Sean Ellis
That was the only use case I had in mind.
I have a few other questions about the OpenGL ES spec that hopefully you can shed some light on. If this is better suited for a separate thread, I'd be happy to move the conversation.
1. The weight infill section (C.2.18) covers the method by which the texel weights are converted from a low resolution texel grid into a full block resolution weight grid. I'm a little bit confused about how to calculate the neighboring weights for bilinear interpolation for this step. The outline is given as:
v0 = js + jt*N;

p00 = decode_weight(v0);

p01 = decode_weight(v0 + 1);

p10 = decode_weight(v0 + N);

p11 = decode_weight(v0 + N + 1);

For a 4x4 texel grid expanding to a 4x4 block grid, I assume each of the texels correspond one-to-one. However, if I follow this procedure, I get index out of bounds problems when trying to decode weights in the bottom right portion of the block grid. The corresponding blend values end up being zero, but in the decode step that's not clear until later. My question: is this a lazy procedure? In other words, if the blend values are zero, do we avoid decoding the weights? It seems like you'd run into issues in the bottom right borders of the texel weight grid regardless.

2. I'm also a bit confused about how to interpret the texel weight data. (C.2.16) The spec says "The weight information is stored as a stream of bits, growing downwards from the most significant bit in the block. Bit n in the stream is thus bit 127-n in the block." Does this mean that the weight data is read in reverse order from the rest of the block information? I.e. does the decompressor start reading from the most significant bit backwards to the least significant of the block data? How does this work with the little endian byte storage? I assume it's something like this:
Memory ---->

Bit |120 - 112|128 - 121

----|---------|

-------------- ^ Start here reading this way ---->

3. Finally, I'd like clarification on a small redundancy w.r.t partition selection. Section C.2.21 says that the partition selection algorithm takes as input a seed which is initialized as the block's partition index (bits 11-22 in table C.2.6). However, it also takes as input the number of partitions in the block (bits 11-12 in table C.2.4). This means that the number of partitions always one more than the value stored in the two least significant bits of the partition index, and the partition function could be simplified to:
int new_select_partition(int seed, int x, int y, int z, int small_block) {

return select_partition(seed, x, y, z, (seed & 0x3) + 1, small_block);

}

Is this correct?
Thanks!
Cancel
Up 0 Down

Cancel
Mani over 10 years ago in reply to Sean Ellis

Hi Sean,
I was doing basic testing with ASTC encoder, I find for the below config I am fing visiable artificats:
Encode setting are : medium with upto 4 possible partitions and 1024 partition indices.
other tools such as 1/2 plane and refinement iteration 2 times are also enabled.
Visible patched at the boundaries of the objects. Is it expected?
Note: I could not up load the image, due system issue.
What kind of behaviors/artifacts are expected for 4x4 & 8x8 configuration with best & medium quality configurations.
Thanks,
Devendran Mani.
Cancel
Up 0 Down

Cancel
Peter Harris over 10 years ago in reply to Mani

ASTC is a lossy compression - if you reduce the bitrate you will expect to get more artefacts, in particular around edges which often have high-frequency components.
Cancel
Up 0 Down

Cancel
Mani over 10 years ago in reply to Peter Harris

Hi Peter,
Thanks for the info.
One more question related to a ASTC encoder issue.
The ARM texture compression tool v4.2.0 & v4.1.0 give diffrent PSNR for a test image as below.
V4.2.1 = 62dB & V4.2.0 = ~50dB
Version : 4.1.0 - above image is from.
Above picture V 4.2.0 - above image.
We observed some line artifacts in V 4.2.0. after analysis we find that 0xFF00 mask is applied in the void extend block at end of decoding.
if we remove the 0xFF00 mask then the PSNR performance matches and the line artifacts are not seen.
Please share your views on this issue.
Thanks,
Devendran Mani.
Cancel
Up 0 Down

Cancel
Mani over 10 years ago in reply to Mani

Hi Pete,
Kindly clarify the below:
We find that the images are flipped(in Y direction) before encoding and then encoded.
Please let me know why the image is flipped? - is flipping function needed in encoder?
Thanks,
Devendran Mani.
Cancel
Up 0 Down

Cancel
Mani over 10 years ago in reply to Peter Harris

Hi Pete,
Kindly clarify the below:
We find that the images are flipped(in Y direction) before encoding and then encoded.
Please let me know why the image is flipped? - is flipping function needed in encoder?
Thanks,
Devendran Mani.
Cancel
Up 0 Down

Cancel
Peter Harris over 10 years ago in reply to Mani

Like most things in graphics much depends on where you think your origin is. In OpenGL ES the texture origin is in the bottom left (in Direct 3D and many related texture encoding formats it is in the top left).
Cancel
Up 0 Down

Cancel
Chris Varnsverry over 10 years ago in reply to Mani

Hi Devmani,
These 2 releases probably ship with different builds of the astc encoder, hence the difference in output. Thanks for flagging this,
Chris
Cancel
Up 0 Down

Cancel
Kirpich30000 over 10 years ago

Hiseanellis
I've played with astcenc and found that recompiling it for x64 target or for x86 target with SSE/AVX support gives up to 2x speedup for thorough and exhaustive modes (I've used vs2012).
But astcenc.exe is an 32bit binary even in 64bit distribution for Windows (Mali_Texture_Compression_Tool_v4.2.0.445f5f1_Windows_x64.exe). So it seems we can get 1.5x-2x speedup for free.
Could you please recompile astcenc.exe and update the MaliTCT installer?
And also could you please update ASTC Evaluation codec source code, because astcenc.exe in MaliTCT is slightly differs?
Cancel
Up 0 Down

Cancel
Chris Varnsverry over 10 years ago in reply to Kirpich30000

hi kirpich30000,
Thanks for highlighting this, I think for now you should be able to replace the astcenc binary shipped with TCT with one of your faster builds, and it will use it.
Thanks,
Chris
Cancel
Up 0 Down

Cancel
Kirpich30000 over 10 years ago in reply to Chris Varnsverry

Yep, I've already done it. Here is some numbers for Intel Core i5 660:
astcenc_tct_v42.exe -ts turret_diffuse_map.png out.astc 4x4 -thorough -time -showpsnr -silentmode
PSNR (LDR-RGBA): 47.471549 dB
Alpha-Weighted PSNR: 47.471548 dB
PSNR (LDR-RGB): 46.222219 dB
Elapsed time: 28.51 seconds, of which coding time: 28.46 seconds
astcenc_eval_x64.exe -ts turret_diffuse_map.png out.astc 4x4 -thorough -time -showpsnr -silentmode
PSNR (LDR-RGBA): 47.471834 dB
Alpha-Weighted PSNR: 47.471834 dB
PSNR (LDR-RGB): 46.222503 dB
Elapsed time: 18.58 seconds, of which coding time: 18.52 seconds
Cancel
Up 0 Down

Cancel

Kirpich30000 over 10 years ago

Hi, Sean Ellis, chrisvarns

I've found small bug in the ASTC Evaluation Codec. In astc_find_best_partitioning.cpp lines 860-862 should be reordered

from:

	860	best_partition = ((best_partition >> PARTITION_BITS) << PARTITION_BITS) \| partition_sequence[best_partition & (PARTITION_COUNT - 1)];
	861	best_partitions_dual_weight_planes[i] = best_partition;
	862	separate_errors[best_partition] = 1e30f;

to:

	860	separate_errors[best_partition] = 1e30f;
	861	best_partitions_dual_weight_planes[i] = best_partition;
	862	best_partition = ((best_partition >> PARTITION_BITS) << PARTITION_BITS) \| partition_sequence[best_partition & (PARTITION_COUNT - 1)];

Original code causes "invalidation" of a wrong partition candidate. Thereof in most cases partition_indices_2planes[0] and partition_indices_2planes[1] would be the same (because on the second iteration separate_errors[1st_best_partition] would still be the best). So effectively only one 2planes partition set would be tested.

This change provides about 0.01db quality improvement