This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vulkan F16 shader calculations support on G76/G78 GPUs

Hi! Half a year later, I'm back with the same question, albeit regarding Vulkan :)

We're developing for G76/G78 devices, such as Note 8 Pro or Samsung S20FE, and we're unable to see any effect of F16 support on any of our Mali GPUs. By effects, I mean even intentional half overflows in shader calculations don't show any artifacts on Mali GPUs, while they do show up on Adreno and desktop hardware. There's no effect on performance, with or without F16 extensions.

Here's how we create the VkDevice:

	VkPhysicalDeviceFeatures deviceFeatures = {};
	deviceFeatures.imageCubeArray = true;
	deviceFeatures.independentBlend = true;
	
    devCreateInfo.pEnabledFeatures = &deviceFeatures;
    devCreateInfo.enabledLayerCount = 0;
    devCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();
    devCreateInfo.enabledExtensionCount = (uint32_t)deviceExtensions.size();

	VkPhysicalDeviceFloat16Int8FeaturesKHR float16Features = { VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FLOAT16_INT8_FEATURES_KHR };
	float16Features.shaderFloat16 = true;

	devCreateInfo.pNext = &float16Features;

	VK_CALL( vkCreateDevice( physDevice, &devCreateInfo, nullptr, &m_device.device ) );

We also include VK_KHR_shader_float16_int8 into ppEnabledExtensionNames.

What's strange, while the GLES extensions app reports support for VK_KHR_shader_float16_int8 extension, when we capture and replay RenderDoc captures on Mali hardware, RenderDoc fails on replay, stating that VK_KHR_shader_float16_int8 is not supported. We've also tried forcing the compiler to use RelaxedPrecision decorations, but this didn't produce any visual results.

Can you please clarify:

  • Are F16 calculations supported on G76/G78 hardware?
  • If they are, what is the correct way to enable F16 support for Mali?
  • If there's a way; for SPIR-V, what does driver look for, float16_t types or RelaxedPrecision decorations?
  • Should F16 overflows be even visible on Mali GPUs or there's some internal handling of such situations e.g clamping, which could be why we can't see overflow artefacts on Mali GPUs?
  • I don't know if this is a secret information, but generally, do vendors receive drivers as part of some kind of hardware support packages, and as such, we can expect identical feature sets for different vendors on same hardware (e.g, Xiaomi and Samsung releasing phones on same SoC) ? Or from time to time vendors can (and did) choose to block specific features, such as F16?
Parents
  • Hi Ivan, 

    A relatively long answer, sorry ... 

    From the GPU side of things, all Mali GPUs support 16-bit calculations, generally implemented as vec2 issue down a 32-bit data path. There are not 16-bit options for every hardware instruction, and there are cases where we force 32-bit texture coords in fragment shaders because of many content issues in the wild. The compiler may choose not to use a 16-bit operation if it doesn't make sense (e.g. cost of type conversion higher than the saving of switching to narrower precision). 

    Lack of performance gain can occur for a couple of reasons:

    • Issuing a scalar fp16 operation isn't any faster than issuing a scalar fp32 operation (not benefiting from the vec2 width).
    • Operation isn't available in fp16 so is actually a 32-bit operation anyway. 
    • Casting between precisions isn't always free so compiler may choose not to do it if overhead exceeds benefits. 

    I'm not aware of any vendors blocking fp16 support - it would be a major power efficiency and performance hit with no obvious upside.  

    In the general case there isn't automatic overflow clamping, so fp16 overflows should be visible if an fp16 type is being used.

    For Vulkan, RelaxedPrecision is fine. On newer drivers with the extension for explicit types that should work too (but has the same limitations that it might not be explicit in reality if the hadrware operation isn't physically availble in a 16-bit flavor).

    What toolchain are you using to generate your SPIR-V? We have seen a few cases where the final SPIR-V has lost RelaxedPrecision annotation by the time it reaches the final output we get given. 

    If you're able to share a SPIR-V file we can check what's going on (feel free to email me via "mobilestudio at arm dot com" if you can't share publicly).

    Cheers, 
    Pete

Reply
  • Hi Ivan, 

    A relatively long answer, sorry ... 

    From the GPU side of things, all Mali GPUs support 16-bit calculations, generally implemented as vec2 issue down a 32-bit data path. There are not 16-bit options for every hardware instruction, and there are cases where we force 32-bit texture coords in fragment shaders because of many content issues in the wild. The compiler may choose not to use a 16-bit operation if it doesn't make sense (e.g. cost of type conversion higher than the saving of switching to narrower precision). 

    Lack of performance gain can occur for a couple of reasons:

    • Issuing a scalar fp16 operation isn't any faster than issuing a scalar fp32 operation (not benefiting from the vec2 width).
    • Operation isn't available in fp16 so is actually a 32-bit operation anyway. 
    • Casting between precisions isn't always free so compiler may choose not to do it if overhead exceeds benefits. 

    I'm not aware of any vendors blocking fp16 support - it would be a major power efficiency and performance hit with no obvious upside.  

    In the general case there isn't automatic overflow clamping, so fp16 overflows should be visible if an fp16 type is being used.

    For Vulkan, RelaxedPrecision is fine. On newer drivers with the extension for explicit types that should work too (but has the same limitations that it might not be explicit in reality if the hadrware operation isn't physically availble in a 16-bit flavor).

    What toolchain are you using to generate your SPIR-V? We have seen a few cases where the final SPIR-V has lost RelaxedPrecision annotation by the time it reaches the final output we get given. 

    If you're able to share a SPIR-V file we can check what's going on (feel free to email me via "mobilestudio at arm dot com" if you can't share publicly).

    Cheers, 
    Pete

Children