In this discussion Peter Harris explained that the ARM Mali g72 MP3 gpu can run 1152 threads concurrently. Can someone please explain where did this number come from I am just starting to learn about this stuff and all I can understand from the specs is that it has 3 Cores which I thought is very low for any parallelization
appreciate the fast reply but can you please elaborate more on this ? this 384 number is per core right ? and to my understanding every core has 3 execution engines that can group instructions into a group of 4 so 3*4 is 12 per core why there's 384 ?
384 threads per core.
3 engines = 128 threads per engine.
4-wide warp = 32 warps per engine, with 4 threads per warp.
Oh thanks so much, but a final question though, Where exactly does it say there's 32 warps per engine ? so I can read more on this
It doesn't - but you can work it out from the other counts. (threads)/(engines * warp width)