In this discussion Peter Harris explained that the ARM Mali g72 MP3 gpu can run 1152 threads concurrently. Can someone please explain where did this number come from I am just starting to learn about this stuff and all I can understand from the specs is that it has 3 Cores which I thought is very low for any parallelization
384 threads per core.
3 engines = 128 threads per engine.
4-wide warp = 32 warps per engine, with 4 threads per warp.
Oh thanks so much, but a final question though, Where exactly does it say there's 32 warps per engine ? so I can read more on this
It doesn't - but you can work it out from the other counts. (threads)/(engines * warp width)