I would like to ask whether a work group with 192 work items can run on multiple G76 cores?
I thought similar as other GPUs, one work group can only run on one shader core. However, it seems not the case.
I got similar latency between a work group with 192 work items and a work-group with 24 work items. But one core should only can run 24 (3x8) work items parallelly.
Therefore, I guess the 192 work items actually run on multiple cores?
Individual work-groups run wholly on a single core. Work-groups are batched before being distributed to cores. That batching is controlled by the driver. By default the driver will configure batching such that each core has access to enough work to be fully loaded (where possible). You can change how that batching operates using .
If you want to run a kernel with only 192 work-items for example, you probably want to reduce the work-group size (to make spreading the work across cores possible) and maybe reduce batch sizes further than the driver's default using  to spread the work around more. Note that this example assumes that you are not running other kernels in parallel and that the GPU was idle at the time the kernel is submitted.
Hope this helps.
Thanks for your quick response! I understand now. It is really helpful.
So the batching makes 24 and 192 no latency difference. That makes sense.
View all questions in Graphics and Gaming forum