hi,
I got a technical question about loop. Let's take an exemple.
int A [3000][4];
int B[3000][4];
int C[3000][4];
Using the CPU is very simple. i compare all A with all B.
for (int x = 0; x > 3000;x++){
for (int y = 0; y > 3000;y++){
look what match between A and B and output to C
}
If i want to do the same thing with GPU i will need to call 3000 time the same Kernel. And send every A to be compare to all B. In this case which of CPU or GPU would be faster.
With CPU i can use Multi core threading and i need to do it 8 time. So with GPU a will need to run 24 000 kernel with a range of (16*16) and a buffer of (400,32) so 50 work group per kernel and all together 1 200 000 work group for the all processing.
I hope that the question is not stupid.
thanks for advace.
But concerning CPU vs GPU speed it look like it depend on amount of data to be processed. Small amount of data look to be faster on CPU then GPU. But i will check it and let you know. I think it is good to know.
After A lot of testing. I will said that for this kind of problem X^2 it depends on the number of X and for mobile the time when CPU scalling start to slow down the CPU frequency.
But for X under 1000/1500 CPU perform a lot better until scalling start at X 2000 it look equals ut over 2000 GPU perform.
The problem is the CPU scalling. So i will try to use only GPU for small and big amount of X.
May the trick on mobile is to avoid massive CPU work. It look like it does not like it too much. But now i know why. And is i run the loop in fonction of the amount of data to proces it is 1 kernel for 256 data to check.
The question was not so stupid ;))