This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vulkan F16 shader calculations support on G76/G78 GPUs

Hi! Half a year later, I'm back with the same question, albeit regarding Vulkan :)

We're developing for G76/G78 devices, such as Note 8 Pro or Samsung S20FE, and we're unable to see any effect of F16 support on any of our Mali GPUs. By effects, I mean even intentional half overflows in shader calculations don't show any artifacts on Mali GPUs, while they do show up on Adreno and desktop hardware. There's no effect on performance, with or without F16 extensions.

Here's how we create the VkDevice:

	VkPhysicalDeviceFeatures deviceFeatures = {};
	deviceFeatures.imageCubeArray = true;
	deviceFeatures.independentBlend = true;
	
    devCreateInfo.pEnabledFeatures = &deviceFeatures;
    devCreateInfo.enabledLayerCount = 0;
    devCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();
    devCreateInfo.enabledExtensionCount = (uint32_t)deviceExtensions.size();

	VkPhysicalDeviceFloat16Int8FeaturesKHR float16Features = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT16_INT8_FEATURES_KHR };
	float16Features.shaderFloat16 = true;

	devCreateInfo.pNext = &float16Features;

	VK_CALL( vkCreateDevice( physDevice, &devCreateInfo, nullptr, &m_device.device ) );

We also include VK_KHR_shader_float16_int8 into ppEnabledExtensionNames.

What's strange, while the GLES extensions app reports support for VK_KHR_shader_float16_int8 extension, when we capture and replay RenderDoc captures on Mali hardware, RenderDoc fails on replay, stating that VK_KHR_shader_float16_int8 is not supported. We've also tried forcing the compiler to use RelaxedPrecision decorations, but this didn't produce any visual results.

Can you please clarify:

  • Are F16 calculations supported on G76/G78 hardware?
  • If they are, what is the correct way to enable F16 support for Mali?
  • If there's a way; for SPIR-V, what does driver look for, float16_t types or RelaxedPrecision decorations?
  • Should F16 overflows be even visible on Mali GPUs or there's some internal handling of such situations e.g clamping, which could be why we can't see overflow artefacts on Mali GPUs?
  • I don't know if this is a secret information, but generally, do vendors receive drivers as part of some kind of hardware support packages, and as such, we can expect identical feature sets for different vendors on same hardware (e.g, Xiaomi and Samsung releasing phones on same SoC) ? Or from time to time vendors can (and did) choose to block specific features, such as F16?
Parents
  • Hi Peter! I've sent an email to the address you mentioned. It contains SPIR-V asm extracted from RenderDoc. It's compiled from HLSL using DXC v. 1.6.2104.52 with command-line arguments 

    -spirv -fspv-target-env=vulkan1.0 -fvk-use-dx-layout -Zpr -HV 2018 -enable-16bit-types -O3.

    Are there any general advices you can give about using DXC and potentially SPIRV-Tools regarding Mali hardware, like recommended optimization layers, input arguments for DXC, etc?

Reply
  • Hi Peter! I've sent an email to the address you mentioned. It contains SPIR-V asm extracted from RenderDoc. It's compiled from HLSL using DXC v. 1.6.2104.52 with command-line arguments 

    -spirv -fspv-target-env=vulkan1.0 -fvk-use-dx-layout -Zpr -HV 2018 -enable-16bit-types -O3.

    Are there any general advices you can give about using DXC and potentially SPIRV-Tools regarding Mali hardware, like recommended optimization layers, input arguments for DXC, etc?

Children
  • We've continued the discussion in email and eventually found the core issue -- drivers for Redmi Note 8 Pro (G76) and Samsung S20FE (G77) ignore true half precision types, but work with relaxed ops, such as min16float. Using min16float instead of half types produces SPIR-V with RelaxedOps decorators, and using spvtools::CreateRelaxFloatOpsPass() produces shaders that exhibit F16 precision artifacts, such as distorted vertices and broken texture scrolling. It's just that driver expects RelaxedOps decorators and visual artifacts are different from Adreno. They even differ between GPU generations -- with spvtools::CreateRelaxFloatOpsPass() both G76 and G77 output broken results, but the ways they are broken are slightly different.

    We're in the process of evaluating energy/performance impact of F16 in our case, but, in the end, my original claim was wrong and F16 operations on Mali do work.

    For now, we've settled for the following definitions in the beginning of our HLSL shaders:

    	"#define half min16float\n";
    	"#define half2 min16float2\n";
    	"#define half3 min16float3\n";
    	"#define half4 min16float4\n";
    	"#define half3x3 min16float3x3\n";
    	"#define half4x4 min16float4x4\n\n";