We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
About mali-g76 MP12 GPU and micro-architecture.
1) Do they context switch between warps to hide memory access latency when the kernel has memory operations ??
2) I saw the datasheet max thread count is 768, is it right 256 threads per execution engine?
and as I know, they have 8 lanes (8-wide warp) per execution engine. how they can run 768 threads simultaneously?? (with context switching? or are they more lanes?...)
I want to understand the process to execute threads in aspect of micro-architectrue.
3) If they can run 768 threads simultaneously and the work-group size is only 24, do they run 24 warps(8-wide warps*3 engine) with same work_group id per core?
if work-group size is 8, remain lanes(24 lanes-8 = 16 lanes) don't work?
(in case of Nvidia, multiple warps with the same work-group per SM)
please help me ~!