Hi! Half a year later, I'm back with the same question, albeit regarding Vulkan :)
We're developing for G76/G78 devices, such as Note 8 Pro or Samsung S20FE, and we're unable to see any effect of F16 support on any of our Mali GPUs. By effects, I mean even intentional half overflows in shader calculations don't show any artifacts on Mali GPUs, while they do show up on Adreno and desktop hardware. There's no effect on performance, with or without F16 extensions.
Here's how we create the VkDevice:
VkPhysicalDeviceFeatures deviceFeatures = {}; deviceFeatures.imageCubeArray = true; deviceFeatures.independentBlend = true; devCreateInfo.pEnabledFeatures = &deviceFeatures; devCreateInfo.enabledLayerCount = 0; devCreateInfo.ppEnabledExtensionNames = deviceExtensions.data(); devCreateInfo.enabledExtensionCount = (uint32_t)deviceExtensions.size(); VkPhysicalDeviceFloat16Int8FeaturesKHR float16Features = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT16_INT8_FEATURES_KHR }; float16Features.shaderFloat16 = true; devCreateInfo.pNext = &float16Features; VK_CALL( vkCreateDevice( physDevice, &devCreateInfo, nullptr, &m_device.device ) );
We also include VK_KHR_shader_float16_int8 into ppEnabledExtensionNames.
What's strange, while the GLES extensions app reports support for VK_KHR_shader_float16_int8 extension, when we capture and replay RenderDoc captures on Mali hardware, RenderDoc fails on replay, stating that VK_KHR_shader_float16_int8 is not supported. We've also tried forcing the compiler to use RelaxedPrecision decorations, but this didn't produce any visual results.
Can you please clarify:
We've continued the discussion in email and eventually found the core issue -- drivers for Redmi Note 8 Pro (G76) and Samsung S20FE (G77) ignore true half precision types, but work with relaxed ops, such as min16float. Using min16float instead of half types produces SPIR-V with RelaxedOps decorators, and using spvtools::CreateRelaxFloatOpsPass() produces shaders that exhibit F16 precision artifacts, such as distorted vertices and broken texture scrolling. It's just that driver expects RelaxedOps decorators and visual artifacts are different from Adreno. They even differ between GPU generations -- with spvtools::CreateRelaxFloatOpsPass() both G76 and G77 output broken results, but the ways they are broken are slightly different.
We're in the process of evaluating energy/performance impact of F16 in our case, but, in the end, my original claim was wrong and F16 operations on Mali do work.
For now, we've settled for the following definitions in the beginning of our HLSL shaders:
"#define half min16float\n"; "#define half2 min16float2\n"; "#define half3 min16float3\n"; "#define half4 min16float4\n"; "#define half3x3 min16float3x3\n"; "#define half4x4 min16float4x4\n\n";