double loop with CPU vs GPU

hi,

I got a technical question about loop. Let's take an exemple.

int A [3000][4];

int B[3000][4];

int C[3000][4];

Using the CPU is very simple. i compare all A with all B.

for (int x = 0; x > 3000;x++){

    for (int y = 0; y > 3000;y++){

          look what match between A and B and output to C

    }

}

If i want to do the same thing with GPU i will need to call 3000 time the same Kernel. And send every A to be compare to all B. In this case which of CPU or GPU would be faster.

With CPU i can use Multi core threading and i need to do it 8 time. So with GPU a will need to run 24 000 kernel with a range of (16*16)  and a buffer of (400,32) so 50 work group per kernel and all together 1 200 000 work group for the all processing.

I hope that the question is not stupid.

thanks for advace.

Parents
  • hi,

    I tried it for 3 days and my conclusion is that GPU does not work like a CPU. I knew that but i tried.

    So, A[3000] comaraison with B[3000] can be done on GPU but it is complicate and the output data must be 3000*3000 in case of all A match with all B. And it is dome randomly, so no sequential work. GPU will always be faster if the number of data is huge. It is really done for massive matrice calculation.

    But with CPU you can use index file and sequential work so there more available possiblity for double loop like:

    for (int X = 0;X < end ;X++){

        for (int Y = X; Y < end ;Y++){

        }

    }   1/2 * X^2 if ordered data

    which is not possible with GPU because global index X and Y cannot be shared between all thread of all group. it does not work. I tried it last week (see post about debug on khronos).

    So, the question was not so stupid but GPU world is very different than CPU world. Both got there advantage abd disavantage. GPU is for calculation and massive on ramdon matrice work. And CPU is for logique work in séquential or indexed order.

    The real problem is frequency scalling on CPU. So it will be a very good idea to produce a mobile for gamer and AI purpose with a good cooloing system to avoid scalling. This would be a steep to laptop and desktop.

    Scalling frequency is the real bootlenek on mobie. We have CPU how run very fast but we can only use let said 25% of there possibility.

    Let's wait for nvidia N1X and see what we can do with it.

    GPU speed vs CPU speed is not a problem of speed it is just a problem of what you need to be done and how you plan to do it.

    The problem i an triyng to solve is associate vector between them. Loop are good on CPU. I wiil try to find if i can do this on GPU. I need to find another way. But i will always need to do some work on CPU because of random GPU work and non indexed output because global index does not work between work group cause of parralel work.

    PS: I can be wrong on some point. So do not hesitate to let me know.

Reply
  • hi,

    I tried it for 3 days and my conclusion is that GPU does not work like a CPU. I knew that but i tried.

    So, A[3000] comaraison with B[3000] can be done on GPU but it is complicate and the output data must be 3000*3000 in case of all A match with all B. And it is dome randomly, so no sequential work. GPU will always be faster if the number of data is huge. It is really done for massive matrice calculation.

    But with CPU you can use index file and sequential work so there more available possiblity for double loop like:

    for (int X = 0;X < end ;X++){

        for (int Y = X; Y < end ;Y++){

        }

    }   1/2 * X^2 if ordered data

    which is not possible with GPU because global index X and Y cannot be shared between all thread of all group. it does not work. I tried it last week (see post about debug on khronos).

    So, the question was not so stupid but GPU world is very different than CPU world. Both got there advantage abd disavantage. GPU is for calculation and massive on ramdon matrice work. And CPU is for logique work in séquential or indexed order.

    The real problem is frequency scalling on CPU. So it will be a very good idea to produce a mobile for gamer and AI purpose with a good cooloing system to avoid scalling. This would be a steep to laptop and desktop.

    Scalling frequency is the real bootlenek on mobie. We have CPU how run very fast but we can only use let said 25% of there possibility.

    Let's wait for nvidia N1X and see what we can do with it.

    GPU speed vs CPU speed is not a problem of speed it is just a problem of what you need to be done and how you plan to do it.

    The problem i an triyng to solve is associate vector between them. Loop are good on CPU. I wiil try to find if i can do this on GPU. I need to find another way. But i will always need to do some work on CPU because of random GPU work and non indexed output because global index does not work between work group cause of parralel work.

    PS: I can be wrong on some point. So do not hesitate to let me know.

Children
No data