Register spilling for different threads count

Hi. According to Arm Mali GPU Datasheet 2020.pdf document there are several modes for maximum thread count, for Mali G76 it is 2 such modes, 768 threads for 0-32 work registers, and 384 for 33-64 work registers.

Is it possible that register spilling can happen if compiler decides that it is better to use only 0-32 work registers and double number of threads for higher performance? For example, kernel exceeded 0-32 range by only 1 additional register and this register can be spilled to global memory, but on the other hand we have double number of threads. If it is the case is there a way to control compiler not to use register spilling in such case, but use full 0-64 range of registers without spilling and only half of threads? Is cl_arm_thread_limit_hint extension can be used for it?

  • Hi Yury, 

    The compiler is responsible for making the trade off decision between thread count and register allocation. It definitely might choose to spill in some cases if the alternative is dropping thread count; quite a common trade off for graphics fragment shading for example. 

    There isn't a direct hint to force the register count, but the (cl_arm_thread_limit_hint) might help for OpenCL programs. Restricting the thread count will allow more registers, and reduce concurrent pressure on the GPU data caches. 

    You can always verify the impact using Mali Offline Compiler, which is part of Mobile Studio (including OpenCL support on macOS and Linux, but not Windows).

    Cheers, 
    Pete

  • Hi Yury,

    >The compiler is responsible for making the trade off decision between thread count and register allocation. It definitely might choose to spill in some cases if the alternative is dropping thread count; quite a common trade off for graphics fragment shading for example. 

    From Bifrost onwards the compiler never chooses to spill with 32 registers and always chooses to drop the thread count and use 64 registers.

    >There isn't a direct hint to force the register count,

    At the moment that is true but we are working on an OpenCL extension to expose just that. I encourage you to get in touch via support if you want to know more.

    >but the (cl_arm_thread_limit_hint) might help for OpenCL programs. Restricting the thread count will allow more registers, and reduce concurrent pressure on the GPU data caches. 

    On Bifrost GPUs, the cl_arm_thread_limit_hint extension has no bearing on the number of registers allocated and strictly speaking, despite its name, it does not limit the number of threads that can be executed by a GPU core. It still enables to reduce the pressure on data caches but by reducing the amount of work a given GPU core has access to. Overall this extension is a hangover of the Midgard era and we are working on a replacement that is a better fit for modern Mali GPUs.

    Regards,

    Kévin

More questions in this forum