hi,
If i remenber the work group processing use to be ramdom. It is still the case using Mali G715.
Or is there a way to force the GPU to work like a CPU, i mean processing the group in row order.
example : for a 2*2 group and a buffer of 10*10. the GPU would process data(0,1,10,11) then (2,3,12,13) until (8,9,18,19) then (20,21,30,31) exct ...
that would be great ;)) ;))
thanks for the answer.
In fact the CPU processing could be done on GPU by changing the size of the work group. If we got 16*16 we can work on 2*128.
going to 32*32 we could go to 2*1024 and 64*64 on 2*2048. So in the futur GPU will nearly work like CPU but not for 8k image but HD image is already a lot of work done ;))
I wanted to improve the contours for image computation to reduce the data to procees. But it is not a priority at the moment.
Most work could be done on the GPU, the question is whether it can be done efficiently. Running a GPU significantly under-threaded because of a need to enforce some fine grained work ordering will seriously reduce performance.
so it would be better to run many kernel if i anderstoud.
Within reason - GPUs will have some implementation-defined limit on how many things can run in parallel in their queue design.
I have never tried to run kernel in parallel. I even does no how to do it. But i'd like to know. ;))