hi,
I got a technical question about loop. Let's take an exemple.
int A [3000][4];
int B[3000][4];
int C[3000][4];
Using the CPU is very simple. i compare all A with all B.
for (int x = 0; x > 3000;x++){
for (int y = 0; y > 3000;y++){
look what match between A and B and output to C
}
If i want to do the same thing with GPU i will need to call 3000 time the same Kernel. And send every A to be compare to all B. In this case which of CPU or GPU would be faster.
With CPU i can use Multi core threading and i need to do it 8 time. So with GPU a will need to run 24 000 kernel with a range of (16*16) and a buffer of (400,32) so 50 work group per kernel and all together 1 200 000 work group for the all processing.
I hope that the question is not stupid.
thanks for advace.