I'm looking into the details of ASTC and planning for developing in hardware. The spec version 1.0 says it has LDR & full profile. What is the difference between LDR & HDR modes? What is Full profile then ?
How does each modes process data. What is the input data format of LDR mode ? Is HDR accepts only 32bit ieee 784 floating point numbers ? How many partitions a block can have ? How about the decoding, the spec says it has to be bit exact decode. If lot of floating point operations are required, then is it possible to get bit exact decoding, because of approximation of fixed point implementation for floating point calculations ?
Ben,
There three profiles for ASTC are supersets of each other, as follows:
For input data, the evaluation codec (available from ASTC Evaluation Codec) takes 8-bit UNORM values for LDR, input as an image, usually in TGA, BMP, GIF or PNG format. For HDR, we take 16-bit pixel values in KTX or DDS formats. The encoder itself works with 16-bit IEEE 754 floats, which are stored in a pseudo-logarithmic format internally to the ASTC data.
A block may have from 1 to 4 partitions, each with a separate set of color endpoints. In addition, there is the option to specify a second set of weights for one channel of the image data. This allows more flexible encoding for textures with uncorrelated channels, such as X+Y for normal mapping, or L+A or RGB+A for decals.
The requirement for bit-exactness was requested by the content developers, as it makes it very much easier to qualify content for multiple platforms if the output is guaranteed. The decoder is specified very exactly using integer operations on the internal representation of HDR data, which synthesise the floating-point operations. This allows us to specify the exact bit-patterns delivered to the filtering stage of the texture pipeline. After that, of course, we have to place our trust in the filter implementation.
I hope that this is helpful.
Hi Sean,
Thanks for the details.
So, can I assume in LDR mode all the internal operations of encoder/decoder are in integer format , and no floating point operations for LDR ?
I'm not clear on partitions, it says the block can use one of 2048 partition patterns, what is 2k(11bit) corresonds to ?
>>>>>> A block may have from 1 to 4 partitions, each with a separate set of color endpoints. In addition, there is the >>>>>option to specify a second set of weights for one channel of the image data.
Is this the plane encoding(second set of weights) mentioned in spec ?
In HDR case, only 16bit ieee floating point operations are performed and no 32bit ieee floating point operations ?
I understood each block can be encoded/decoded in parallel, because of no dependency, however the concept of "void extent block" is creating a dependcy on neighboring blocks ? How does the encoding of void extent blocks happen, if it has to signify the presence of neighboring colors? How many neighbors it considers ?
-ben
The latest spec is the extension specification on the Khronos website - https://www.khronos.org/registry/gles/extensions/OES/OES_texture_compression_astc.txt
The 16-bit output is to allow filtering without having to know which profile is being decoded. If your filtering unit only takes 8-bit input, then I think that it is acceptable to take the top 8-bits of the 16-bit UNORM result without changing it to floating point.
Similarly, the internal precision of the interpolations between weigths is 16-bits, so even with a LDR input it is possible to get 16-bit outputs.
The evaluation codec supports full profile, which includes the HDR and LDR profiles. By default, only LDR endpoints are considered, so if you want to encode as HDR you should also supply the "-hdr" command line switch.
Sean
Is there any fields which specify the maximum width & height of the image being encoded? Because of area limitations we won't be able to support HDR profile in hardware for both encoder/decoder and only LDR at 8x8 block size(2bpp). What is the best quality I can get with 2bpp bitrate for natural/synthetic images ? Do you see any consequences in not supporting HDR & Full profile and LDR at 2bpp?
While debugging the code, I noticed the image is flipped and accessed for encoding under file astc_stb_tga.cpp line no:46 "y_flip" Why the image is accessed bottom top ? Is this a requirement ?
Any real-time application use case for astc encoding ?
Could you please respond to my queries, posted above.
Regards,
Sorry for the delay - I haven't checked in for a while.
If you are not supporting 2bpp (or only supporting 2bpp, I'm not sure which), the consequence is that you will not pass the conformance test, and will not be able to claim ASTC support. Khronos rules are pretty clear on spec conformance matters.
One of the main reasons we introduced ASTC in the first place was feedback from the content developers that the market was too fragmented. The Khronos group therefore ratified the specs with very little room for deviation, so that developers could guarantee results across different platforms. Decode must be bit exact, and all features in the spec must be present - this includes support for all block sizes.
The layout of the blocks in the image is defined in the spec and should start with the block closest to the (s=0, t=0) corner of the image. How we map that to the (x,y) pixels in the image, however, is not specified. I will have to investigate the vertical flip.
There may possibly be a real-time use case for ASTC, which is to use render-to-texture to create relatively long-lived textures such as a skybox. However this will require a specialized encoder optimized for the specific type of images being produced in order to constrain the search space and approach real-time frame rates.
Sean.
Thanks for the details. I need some help on understanding the encoder source code, is there any detailed notes on implementation. As I mentioned earlier we are planning to support only LDR profile and 8bit inputs.
While going through the encoder source code, there are lot of float usage in the code. For example in function
fetch_imageblock() there are use of float variables, (float data[6] and float *fptr = pb->orig_data;) etc. These floats are represented in ieee32bit floating point values. From the earlier conv messages you have mentioned that all are in fixed point version of float implementation. But I don't see any fixed point conversion happening in the code. Is there a fixed point implementation of the ASTC code available ? For our hardware realization it would be helpful if we can have that source code.
The Visual studio solution was not compiling in VS2008. Which version of VS I should use ?
Kindly provide the required details.
The comments I made about floating point usage and spec conformance apply to the decoder; sorry if this wasn't clear. If you are implementing an encoder, you are free to accept whatever subset of input best suits your needs, as long as the output is validly encoded.
We don't have another publicly available version of the encoder. It would be possible to restrict the floating point operations, but we were targeting a desktop machine where a 32-bit floating point add or multiply is approximately as fast as the equivalent 32-bit integer operation. We therefore felt that the encoder code would be more performant and more readable if we just kept everything in native floating point. The same should be true on an ARM A-class core.
As far as speeding things up is concerned, I think that your best bet will be to restrict the number of encoding modes that have to be searched. If you have a corpus of representative images, you could analyse the output to see what encoding points are most often used, and which are not. (How many times do you need a 4-partition block, for example?) Then work to remove the rarely used modes from the search algorithm, whilst checking that these don't introduce unacceptable artefacts.
The Visual Studio solution was created with VS 2010, and I believe that it is compatible with VS 2010 Express.
Even in decoder it retrives the values in ieee32bit float values for rgb (file astc_image_load_store.cpp: function write_imageblock() ) and then scale it by multiplying by 255 and stored as int. So the decoder is also treating the data as float. the data struct which decoder uses is as
follows
float orig_data[MAX_TEXELS_PER_BLOCK * 4]; // original input data
The software decoder does use floats. However, in the hardware, everything is defined as integer operations.
But then the fixed point implementation of decoder won't be bit-exact with reference floating point implementation of decoder. There will be a difference of +/-1. How then we can pass the conformance criteria of bit-exactness ?
To implement all the floating point operations using fixed point implementation, how many bits of precision would be good enough ? Can I use 8 bit fixed point format in intermediate representation to do all the float ?
Could you please help me in concluding my design decisions. My design on astc encoder will be based on fixed point implementation and no floating point hardware unit. And my fixed point representation will be using 8bit. What will be the impact on encoding if I restrict my encoder using 8bit fixed point ? Is it possible to implement using 8bit fixed point?
Please respond to my queries.
Happy New Year, Ben. I have been away over the Christmas break and so haven't seen your latest questions. Give me a little while and I will get back to you.
Happy New Year Sean
You are free to use any method you like to encode an image, as long as the resulting encoding is legal, and you are happy with how the result looks. The bit-exact requirements are for decoding, so that what you see on hardware from manufacturer X will be the same as from manufacturers Y and Z. This was a primary requirement from content developers, who wanted to make sure that they didn't have to separately requalify their texture assets on all the different target devices.
I hope I can also put your mind at rest about the floating-point reference decoder. The decoder as written does produce the exact same results as the hardware - we have verified this using our internal test suites and at least two external implementations of the decoder.
I understood the decoder part, as you earlier mentioned the output will be UNORM16, which then converted to half float(16bit) and then to full float(32bit). So I if take the output at UNOMRM16 stage and take the upper 8 bits, it should be bitexact and I can avoid all float operations. Please correct me if I'm wrong.
I believe that you are correct, yes.
Thanks for clearing the doubts on decoder. I will start looking into encoding and will have more doubts.
BTW I want to know more about AFBC ( Arm Frame Buffer Compression). Is it part of MALI Graphics IP. What is the typical usecase for AFBC ?
Here are a couple of blogs which help to explain how AFBC is used in the system. First, one by me which fits AFBC into our strategy for reducing whole-system power, and then one from Ola Hugosson which talks about how AFBC is used inside the Mali-V500 video codec.
In reference ASTC decoder,why the fixed point output (8bit) option is not provided ? For encoder any optimized reference code is available ? What all options can be tried to reduce the hardware complexity ?
The evaluation codec is a proof-of-concept of the encode and decode processes. I'm not sure what you mean by 8 bit output not being supported. The LDR output is in 8-bit format, so do you mean for HDR? HDR 8-bit output is supported in the spec in order to cater for sRGB encoded images, and this is supported in the codec too.
The encoder has already been quite extensively optimised - particularly in the selection of color endpoints and other exhaustive inner loops.
When you say "reduce the hardware complexity", do you mean for decode or encode? Encoding hardware isn't mandated, so you are free to take whatever shortcuts you like, as long as you are happy with the result and it's a valuid encoding. However, decoding hardware must support all the possibilities at your chosen feature level (LDR, HDR, or full profile) in order to pass conformance, so your only real option here is to decide which profile to support.
I mean in case of 8-bit sRGB LDR output UNORM 16 is converted to fp16 and to fp32 and then scale back to 8bit. What I need is the output before floating point conversion. ie ((UNORM16) >> 8)
It would certainly be possible for you to change the data written by the function to 8-bits, but this would require fairly extensive changes elsewhere too. The reason that we return floats is because this is the more general format and allows us to get all the data flowing through the codec in a common format without having to have typed unions or other annoyances. Strictly speaking, you are right - the conversions from UNORM16 (top 8 bits) to fp16 to fp32 and back to 8-bit are redundant, but they should all be value-preserving and thus result in the same 8-bit value out as went in. In real hardware, of course, if you only require 8-bit output, you can legitimately skip these steps.
While going through the encoder code, I understood the computations are done on floating point values. Is all these calculations are for taking decisions ? Or these float values will be used in coding the weights ? I can compute the direction and best vector by using fixed point operations, but what about the weight calculations?
As I say, it's possible in the encoder to take whatever shortcuts you like, as long as you are happy with the output quality. So it should be possible to calculate the weights using fixed point as long as the result is in the allowed range and encodes legally.
ASTC decoder always outputs the data as 4 components (ARGB). For example the image is GRAY still it outputs as ARGB. Is this a requirement for GPU hardware ?
Is it possible to have block level color settings, for example Block 0 is luma, Block 1 RGB, Block 2 Luminance + Alpha, Block 3 ARGB ?
What is the use of swizzle pattern in astc ?
is sRGB conversion mandatory as part of ASTC or can be handled at display side ?
Could you please respond
Again, sorry for the slow response - I had a few days off to move house, and am currently on the road in the USA.
Yes, the ASTC decoder always decodes to RGBA as this is what is passed from the texture unit into the OpenGL ES Shading Language. A sampler will always return a vec4 of appropriate type, with appropriate conversions. A luminance-only texture access that returns value L will be actually be passed to the pipeline as {r,g,b,a}={L,L,L,1}. This is so that the shader code and the sampler type can be separately specified without having to recompile the shader, which could be costly.
Each block has its own color endpoint modes, so you are correct. Imagine, for example, encoding a picture of the Amundsen Scott South Pole Research Station. The majority of blocks in the image would be snow or grey sky and would be encoded as Luma only, and only those containing the brightly colored pixels of the base itself would be encoded as RGB. Similarly, if an image is completely opaque, the encoder should never choose a color endpoint mode which includes alpha, but the convere is not true. In an image with variable opacity, the encoder should pick color endpoints without alpha for opaque areas, and only use LA or RGBA endocing modes for those blocks which are actually semi-transparent.
The swizzle pattern in the encoder just allows you to switch the input or output color channels around, in case your inputs are (for example) encoded in BGR mode. Or are you referring to the swizzle mentioned in HDR Endpoint Mode 7? If so, that ensures that the component with the largest value is in a consistent position in the encoding.
In OpenGL ES, the sRGB conversion happens before the values are handed across to the shader, so the converter must be placed in this position in the pipeline. It is likely that this hardware already exists to service uncompressed sRGB, however, in which case it should not be expensive in terms of area or implementation.
View all questions in Graphics and Gaming forum