I'm looking into the details of ASTC and planning for developing in hardware. The spec version 1.0 says it has LDR & full profile. What is the difference between LDR & HDR modes? What is Full profile then ?
How does each modes process data. What is the input data format of LDR mode ? Is HDR accepts only 32bit ieee 784 floating point numbers ? How many partitions a block can have ? How about the decoding, the spec says it has to be bit exact decode. If lot of floating point operations are required, then is it possible to get bit exact decoding, because of approximation of fixed point implementation for floating point calculations ?
Hi Sean,
I understood the decoder part, as you earlier mentioned the output will be UNORM16, which then converted to half float(16bit) and then to full float(32bit). So I if take the output at UNOMRM16 stage and take the upper 8 bits, it should be bitexact and I can avoid all float operations. Please correct me if I'm wrong.
-ben
Ben,
I believe that you are correct, yes.
Sean.
Thanks for clearing the doubts on decoder. I will start looking into encoding and will have more doubts.
BTW I want to know more about AFBC ( Arm Frame Buffer Compression). Is it part of MALI Graphics IP. What is the typical usecase for AFBC ?
Here are a couple of blogs which help to explain how AFBC is used in the system. First, one by me which fits AFBC into our strategy for reducing whole-system power, and then one from Ola Hugosson which talks about how AFBC is used inside the Mali-V500 video codec.
In reference ASTC decoder,why the fixed point output (8bit) option is not provided ? For encoder any optimized reference code is available ? What all options can be tried to reduce the hardware complexity ?
The evaluation codec is a proof-of-concept of the encode and decode processes. I'm not sure what you mean by 8 bit output not being supported. The LDR output is in 8-bit format, so do you mean for HDR? HDR 8-bit output is supported in the spec in order to cater for sRGB encoded images, and this is supported in the codec too.
The encoder has already been quite extensively optimised - particularly in the selection of color endpoints and other exhaustive inner loops.
When you say "reduce the hardware complexity", do you mean for decode or encode? Encoding hardware isn't mandated, so you are free to take whatever shortcuts you like, as long as you are happy with the result and it's a valuid encoding. However, decoding hardware must support all the possibilities at your chosen feature level (LDR, HDR, or full profile) in order to pass conformance, so your only real option here is to decide which profile to support.
I mean in case of 8-bit sRGB LDR output UNORM 16 is converted to fp16 and to fp32 and then scale back to 8bit. What I need is the output before floating point conversion. ie ((UNORM16) >> 8)
It would certainly be possible for you to change the data written by the function to 8-bits, but this would require fairly extensive changes elsewhere too. The reason that we return floats is because this is the more general format and allows us to get all the data flowing through the codec in a common format without having to have typed unions or other annoyances. Strictly speaking, you are right - the conversions from UNORM16 (top 8 bits) to fp16 to fp32 and back to 8-bit are redundant, but they should all be value-preserving and thus result in the same 8-bit value out as went in. In real hardware, of course, if you only require 8-bit output, you can legitimately skip these steps.
While going through the encoder code, I understood the computations are done on floating point values. Is all these calculations are for taking decisions ? Or these float values will be used in coding the weights ? I can compute the direction and best vector by using fixed point operations, but what about the weight calculations?
As I say, it's possible in the encoder to take whatever shortcuts you like, as long as you are happy with the output quality. So it should be possible to calculate the weights using fixed point as long as the result is in the allowed range and encodes legally.
Sean
ASTC decoder always outputs the data as 4 components (ARGB). For example the image is GRAY still it outputs as ARGB. Is this a requirement for GPU hardware ?
Is it possible to have block level color settings, for example Block 0 is luma, Block 1 RGB, Block 2 Luminance + Alpha, Block 3 ARGB ?
What is the use of swizzle pattern in astc ?
is sRGB conversion mandatory as part of ASTC or can be handled at display side ?
Could you please respond
Again, sorry for the slow response - I had a few days off to move house, and am currently on the road in the USA.
Yes, the ASTC decoder always decodes to RGBA as this is what is passed from the texture unit into the OpenGL ES Shading Language. A sampler will always return a vec4 of appropriate type, with appropriate conversions. A luminance-only texture access that returns value L will be actually be passed to the pipeline as {r,g,b,a}={L,L,L,1}. This is so that the shader code and the sampler type can be separately specified without having to recompile the shader, which could be costly.
Each block has its own color endpoint modes, so you are correct. Imagine, for example, encoding a picture of the Amundsen Scott South Pole Research Station. The majority of blocks in the image would be snow or grey sky and would be encoded as Luma only, and only those containing the brightly colored pixels of the base itself would be encoded as RGB. Similarly, if an image is completely opaque, the encoder should never choose a color endpoint mode which includes alpha, but the convere is not true. In an image with variable opacity, the encoder should pick color endpoints without alpha for opaque areas, and only use LA or RGBA endocing modes for those blocks which are actually semi-transparent.
The swizzle pattern in the encoder just allows you to switch the input or output color channels around, in case your inputs are (for example) encoded in BGR mode. Or are you referring to the swizzle mentioned in HDR Endpoint Mode 7? If so, that ensures that the component with the largest value is in a consistent position in the encoding.
In OpenGL ES, the sRGB conversion happens before the values are handed across to the shader, so the converter must be placed in this position in the pipeline. It is likely that this hardware already exists to service uncompressed sRGB, however, in which case it should not be expensive in terms of area or implementation.