Greetings,
After reading“PHENOMENAL COSMIC POWERS! Itty-bitty living space!” from edplowman, I'm wondering how the FP16 type can actually be used ?
When reading the ARMv7 and ARMv8 architectures manuals, the only instructions that I found to refer to half-precision floating-points are VCVT (ARMv7) or FCVT (ARMv8).
So, my questions are :
At the moment the most one can do is save space and not worry much about the conversion. They are useful in artificial intelligence type applications and in graphics where quite often a high bandwidth is required but not high accuracy. There is more extensive support for them in graphics units and a later version of ARMv8 will also add support for calculations using them
ARMv8-A architecture evolution
Oh, interesting ! I guess that the first ARMv8-A boards will be released during the first quarter 2017 then ?
Meanwhile, I wonder if there's OpenGL examples, compiled for ARM architectures, that use half-float for texture coordinates.
I'd like to try using fp16 for texture coordinates, since fp32 seems overkill. However, I don't know how to define the data type with GCC.
While grep'ing GCC source code, I found the -mfp16-format, the __fp16 type and the float16x4_t type. Should I use those types in data structures containing UV coordinates ?
It seems that GCC only understand the __fp16 type when using the -mfp16-format=ieee option, but this option seem to only work with the armv7 version of the compiler. With the aarch64 it does not.
For most graphics use cases you generally don't need to process bulk data at all on the CPU; e.g. vertex attribute data tends to be exported from the content creation tools in fp16 as part of the application build, and then can be copied directly into a vertex buffer object without any processing during level load. By pushing down-conversion to asset creation time, that means you also save download bandwidth and install size, which your users will be grateful for too!
For data you touch regularly on the CPU, such as uniform matrices, it's likely that you're dealing with positional data and need higher precision than fp16 anyway.
For most real textures fp16 coordinates are not precise enough, especially on larger screen sizes. You generally want enough precision to cope with (1) texture UV coordinate wrapping for tiled textures and (2) about 16 sub-pixel divisions for good quality filtering - fp16 simply runs out of bits long before that ...
In general for anything related to position (texture coordinates, vertex positions, uniform matrices for position transform, distance computation for lighting, etc) we'd generally recommend using highp/fp32. For anything related to color, or intermediate values which will turn in to a color at some point (such as normals for lighting) then fp16 is probably fine.
You generally want enough precision to cope with (1) texture UV coordinate wrapping for tiled textures and (2) about 16 sub-pixel divisions for good quality filtering - fp16 simply runs out of bits long before that ...
Does it affect automatic filtering (GL_SAMPLES, GL_LINEAR_MIPMAP_LINEAR, Anisotropic extensions) or only hand-written filtering algorithms ?
I mean, can the visual quality of some applied textures be improved by just setting precision highp float; instead of precision mediump float; in the fragment shader and sending fp32 coordinates ?
Meanwhile, the basic rule is :
It will affect everything; it's just a problem with quantization causing less accurate sample points with higher floating point values (as the exponent gets bigger you get fewer and fewer decimal places).
Potentially yes; it depends how the texture is being used (do you have UV wrapping), and on the size of the texture (bigger texture = more pixels to cover with the same 0-1 number range, so effectively less bits per pixel). The driver can help automatically here (we know what inputs are used as texture coordinates), so we can prevent the worst of the issues without the application changing anything.
Thanks for these clarifications !
That said, what would be the general best practices for good CPU←→GPU bandwidth usage while retaining enough quality then ?
FP32 (Highp) for close range / high detail assets and (FP16) Mediump for landscape and random filling decoration, I guess ?
See previous answer:
=)
Alright then