double loop with CPU vs GPU

hi,

I got a technical question about loop. Let's take an exemple.

int A [3000][4];

int B[3000][4];

int C[3000][4];

Using the CPU is very simple. i compare all A with all B.

for (int x = 0; x > 3000;x++){

    for (int y = 0; y > 3000;y++){

          look what match between A and B and output to C

    }

}

If i want to do the same thing with GPU i will need to call 3000 time the same Kernel. And send every A to be compare to all B. In this case which of CPU or GPU would be faster.

With CPU i can use Multi core threading and i need to do it 8 time. So with GPU a will need to run 24 000 kernel with a range of (16*16)  and a buffer of (400,32) so 50 work group per kernel and all together 1 200 000 work group for the all processing.

I hope that the question is not stupid.

thanks for advace.

Parents
  • hi,

    After porting the CPU work to GPU i found that the problem is not CPU or GPU. I try only GPU, only many small CPU and big CPU. Big CPU still a little bit faster. the worst is testing GPU and CPU at the same time, this just double the time of processing.

    So i thinks that with mobile and laptop the problem is the amount of instructions taht can be processed by unit of time. So using GPU or CPU just depends on what kind of work you need to do. But you cannot do more work than the processor can support before burning. ;))

    So i anderstand why they try to reduce the processor printing size and use RISC instruction.

    Conclusion : mobile are limited by instruction in time unit. That is why from one frame to another the time can change ,one goes faster the next slower but in average it is the same. And the 6 seconde are just the neccery time to calculate the right frequency speed to be used. But by removing as much as you can "if" and reduce the array size is a good point.

Reply
  • hi,

    After porting the CPU work to GPU i found that the problem is not CPU or GPU. I try only GPU, only many small CPU and big CPU. Big CPU still a little bit faster. the worst is testing GPU and CPU at the same time, this just double the time of processing.

    So i thinks that with mobile and laptop the problem is the amount of instructions taht can be processed by unit of time. So using GPU or CPU just depends on what kind of work you need to do. But you cannot do more work than the processor can support before burning. ;))

    So i anderstand why they try to reduce the processor printing size and use RISC instruction.

    Conclusion : mobile are limited by instruction in time unit. That is why from one frame to another the time can change ,one goes faster the next slower but in average it is the same. And the 6 seconde are just the neccery time to calculate the right frequency speed to be used. But by removing as much as you can "if" and reduce the array size is a good point.

Children
No data