About mali-g76 MP12 GPU and micro-architecture.
1) Do they context switch between warps to hide memory access latency when the kernel has memory operations ??
2) I saw the datasheet max thread count is 768, is it right 256 threads per execution engine?
and as I know, they have 8 lanes (8-wide warp) per execution engine. how they can run 768 threads simultaneously?? (with context switching? or are they more lanes?...)
I want to understand the process to execute threads in aspect of micro-architectrue.
3) If they can run 768 threads simultaneously and the work-group size is only 24, do they run 24 warps(8-wide warps*3 engine) with same work_group id per core?
if work-group size is 8, remain lanes(24 lanes-8 = 16 lanes) don't work?
(in case of Nvidia, multiple warps with the same work-group per SM)
please help me ~!