This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why uniform buffer is limit to so small in <<Arm GPU Best Practices Developer Guide>>?

https://developer.arm.com/documentation/101897/0301/Shader-code/Uniforms

it said that

"Keep your uniform data small. 128 bytes is a good general rule for how much data can be promoted to registers in any given shader."

Seems to be too small, opengles ubo max size is 16kb.

128bytes is 32 float, easy to exceed that count.

because the mobile gpu sgpr register file is too small?

Parents
  • Mali can promote uniforms to constant registers, which are effectively free to access (no per-thread load in the shader code). Arm GPUs in the Bifrost family or newer actually have 512 bytes of uniform storage (used by both user uniforms and driver-managed uniforms), so the actual limit on modern hardware is a lot higher than the 128 bytes in the document.

    If you use more uniforms than this it works fine, but falls back to memory reads from the uniform buffer in shader code, which will be slightly slower and put more pressure on work register storage.

    Seems to be too small, opengles ubo max size is 16kb.

    I doubt any single shader is actually loading 16KB of uniform data though - that's going to go slow on any GPU architecture ...

    Cheers,
    Pete

Reply
  • Mali can promote uniforms to constant registers, which are effectively free to access (no per-thread load in the shader code). Arm GPUs in the Bifrost family or newer actually have 512 bytes of uniform storage (used by both user uniforms and driver-managed uniforms), so the actual limit on modern hardware is a lot higher than the 128 bytes in the document.

    If you use more uniforms than this it works fine, but falls back to memory reads from the uniform buffer in shader code, which will be slightly slower and put more pressure on work register storage.

    Seems to be too small, opengles ubo max size is 16kb.

    I doubt any single shader is actually loading 16KB of uniform data though - that's going to go slow on any GPU architecture ...

    Cheers,
    Pete

Children