This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Register spilling for different threads count

Hi. According to Arm Mali GPU Datasheet 2020.pdf document there are several modes for maximum thread count, for Mali G76 it is 2 such modes, 768 threads for 0-32 work registers, and 384 for 33-64 work registers.

Is it possible that register spilling can happen if compiler decides that it is better to use only 0-32 work registers and double number of threads for higher performance? For example, kernel exceeded 0-32 range by only 1 additional register and this register can be spilled to global memory, but on the other hand we have double number of threads. If it is the case is there a way to control compiler not to use register spilling in such case, but use full 0-64 range of registers without spilling and only half of threads? Is cl_arm_thread_limit_hint extension can be used for it?

Parents
  • Hi Yury, 

    The compiler is responsible for making the trade off decision between thread count and register allocation. It definitely might choose to spill in some cases if the alternative is dropping thread count; quite a common trade off for graphics fragment shading for example. 

    There isn't a direct hint to force the register count, but the (cl_arm_thread_limit_hint) might help for OpenCL programs. Restricting the thread count will allow more registers, and reduce concurrent pressure on the GPU data caches. 

    You can always verify the impact using Mali Offline Compiler, which is part of Mobile Studio (including OpenCL support on macOS and Linux, but not Windows).

    Cheers, 
    Pete

Reply
  • Hi Yury, 

    The compiler is responsible for making the trade off decision between thread count and register allocation. It definitely might choose to spill in some cases if the alternative is dropping thread count; quite a common trade off for graphics fragment shading for example. 

    There isn't a direct hint to force the register count, but the (cl_arm_thread_limit_hint) might help for OpenCL programs. Restricting the thread count will allow more registers, and reduce concurrent pressure on the GPU data caches. 

    You can always verify the impact using Mali Offline Compiler, which is part of Mobile Studio (including OpenCL support on macOS and Linux, but not Windows).

    Cheers, 
    Pete

Children