We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
I am trying to implement an OpenCL kernel on G76 with DDK r16.
I find that if I define and use an array like "half A[16];", the performance will be poor.
But if I use "half16 A;", the performance is very good.
I wonder if array is mapping into global memory so that the performance is poor when using array?
However, I need to use array instead of vector because the algorithm needs to index the ith element "A[i]" in a for loop.
I think it is impossible to use vector in such a way "A.si".
Can anyone help me?
Thank you very much in advance!
A lot depends what your code looks like and how the array is being used at runtime. Some arrays can be promoted to registers, some cannot - it all depends on what the kernel code is doing with them.
Can you share a kernel?
Cheers,Pete
Hi Peter, thank you very much! It is really like what you say, it depends on my code!!
I have another question:
Do you know is there any dedicated hardware for local memory on G76?
I want to use local memory to optimize the gemm, do you think it is a good idea?
Mali GPUs do not have an dedicated local memory for compute shaders.