hi,
If i remenber the work group processing use to be ramdom. It is still the case using Mali G715.
Or is there a way to force the GPU to work like a CPU, i mean processing the group in row order.
example : for a 2*2 group and a buffer of 10*10. the GPU would process data(0,1,10,11) then (2,3,12,13) until (8,9,18,19) then (20,21,30,31) exct ...
that would be great ;)) ;))
No, there is no way to guarantee workgroup execution order.
Doing "all(workgroup_A) then all(workgroup_B) then ..." would be hideously slow. GPUs are data parallel processors and need that data parallelism with lots of concurrently executing threads to fill the available hardware. Lots of small things running serially is a poor fit for a GPU architecture, and it may well be faster to run that workload on the CPU because GPUs are really not good at that style of processing.
What's the actual problem are you trying to solve?
thanks for the answer.
In fact the CPU processing could be done on GPU by changing the size of the work group. If we got 16*16 we can work on 2*128.
going to 32*32 we could go to 2*1024 and 64*64 on 2*2048. So in the futur GPU will nearly work like CPU but not for 8k image but HD image is already a lot of work done ;))
I wanted to improve the contours for image computation to reduce the data to procees. But it is not a priority at the moment.
Most work could be done on the GPU, the question is whether it can be done efficiently. Running a GPU significantly under-threaded because of a need to enforce some fine grained work ordering will seriously reduce performance.
so it would be better to run many kernel if i anderstoud.
Within reason - GPUs will have some implementation-defined limit on how many things can run in parallel in their queue design.
I have never tried to run kernel in parallel. I even does no how to do it. But i'd like to know. ;))