This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Is there practical examples of Half-float (FP16) ?

Greetings,

After reading“PHENOMENAL COSMIC POWERS! Itty-bitty living space!” from edplowman, I'm wondering how the FP16 type can actually be used ?

When reading the ARMv7 and ARMv8 architectures manuals, the only instructions that I found to refer to half-precision floating-points are VCVT (ARMv7) or FCVT (ARMv8).

So, my questions are :

  • Can CPU do anything with half-precision floating points beside converting them ? Can you add/subtract/multiply/divide half-precision floating-points natively ?
  • How do you use half-precision floating points values efficiently with OpenGL ? Do you do all the operations with single-precision floats and do a conversion before sending the data to the GPU ?
  • Is there any example showing how to use this data type efficiently ?
Parents
  • For most graphics use cases you generally don't need to process bulk data at all on the CPU; e.g. vertex attribute data tends to be exported from the content creation tools in fp16 as part of the application build, and then can be copied directly into a vertex buffer object without any processing during level load. By pushing down-conversion to asset creation time, that means you also save download bandwidth and install size, which your users will be grateful for too!

    For data you touch regularly on the CPU, such as uniform matrices, it's likely that you're dealing with positional data and need higher precision than fp16 anyway.

    Meanwhile, I wonder if there's OpenGL examples, compiled for ARM architectures, that use half-float for texture coordinates.

    For most real textures fp16 coordinates are not precise enough, especially on larger screen sizes. You generally want enough precision to cope with (1) texture UV coordinate wrapping for tiled textures and (2) about 16 sub-pixel divisions for good quality filtering - fp16 simply runs out of bits long before that ...

    In general for anything related to position (texture coordinates, vertex positions, uniform matrices for position transform, distance computation for lighting, etc) we'd generally recommend using highp/fp32. For anything related to color, or intermediate values which will turn in to a color at some point (such as normals for lighting) then fp16 is probably fine.

Reply
  • For most graphics use cases you generally don't need to process bulk data at all on the CPU; e.g. vertex attribute data tends to be exported from the content creation tools in fp16 as part of the application build, and then can be copied directly into a vertex buffer object without any processing during level load. By pushing down-conversion to asset creation time, that means you also save download bandwidth and install size, which your users will be grateful for too!

    For data you touch regularly on the CPU, such as uniform matrices, it's likely that you're dealing with positional data and need higher precision than fp16 anyway.

    Meanwhile, I wonder if there's OpenGL examples, compiled for ARM architectures, that use half-float for texture coordinates.

    For most real textures fp16 coordinates are not precise enough, especially on larger screen sizes. You generally want enough precision to cope with (1) texture UV coordinate wrapping for tiled textures and (2) about 16 sub-pixel divisions for good quality filtering - fp16 simply runs out of bits long before that ...

    In general for anything related to position (texture coordinates, vertex positions, uniform matrices for position transform, distance computation for lighting, etc) we'd generally recommend using highp/fp32. For anything related to color, or intermediate values which will turn in to a color at some point (such as normals for lighting) then fp16 is probably fine.

Children
  • You generally want enough precision to cope with (1) texture UV coordinate wrapping for tiled textures and (2) about 16 sub-pixel divisions for good quality filtering - fp16 simply runs out of bits long before that ...

    Does it affect automatic filtering (GL_SAMPLES, GL_LINEAR_MIPMAP_LINEAR, Anisotropic extensions) or only hand-written filtering algorithms ?

    I mean, can the visual quality of some applied textures be improved by just setting precision highp float; instead of precision mediump float; in the fragment shader and sending fp32 coordinates ?

    Meanwhile, the basic rule is :

    • asset used 'as-is' → fp16
    • asset used in computations → fp32 ?
  • Does it affect automatic filtering (GL_SAMPLES, GL_LINEAR_MIPMAP_LINEAR, Anisotropic extensions) or only hand-written filtering algorithms ?

    It will affect everything; it's just a problem with quantization causing less accurate sample points with higher floating point values (as the exponent gets bigger you get fewer and fewer decimal places).

    I mean, can the visual quality of some applied textures be improved by just setting precision highp float; instead of precision mediump float; in the fragment shader and sending fp32 coordinates ?

    Potentially yes; it depends how the texture is being used (do you have UV wrapping), and on the size of the texture (bigger texture = more pixels to cover with the same 0-1 number range, so effectively less bits per pixel). The driver can help automatically here (we know what inputs are used as texture coordinates), so we can prevent the worst of the issues without the application changing anything.

  • Thanks for these clarifications !

    That said, what would be the general best practices for good CPU←→GPU bandwidth usage while retaining enough quality then ?

    FP32 (Highp) for close range / high detail assets and (FP16) Mediump for landscape and random filling decoration, I guess ?

  • See previous answer:

     

    In general for anything related to position (texture coordinates, vertex positions, uniform matrices for position transform, distance computation for lighting, etc) we'd generally recommend using highp/fp32. For anything related to color, or intermediate values which will turn in to a color at some point (such as normals for lighting) then fp16 is probably fine.

    =)