We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
I am trying to implement an OpenCL kernel on G76 with DDK r16.
I find that if I define and use an array like "half A[16];", the performance will be poor.
But if I use "half16 A;", the performance is very good.
I wonder if array is mapping into global memory so that the performance is poor when using array?
However, I need to use array instead of vector because the algorithm needs to index the ith element "A[i]" in a for loop.
I think it is impossible to use vector in such a way "A.si".
Can anyone help me?
Thank you very much in advance!
Mali GPUs do not have an dedicated local memory for compute shaders.