Hi,
I am trying to implement an OpenCL kernel on G76 with DDK r16.
I find that if I define and use an array like "half A[16];", the performance will be poor.
But if I use "half16 A;", the performance is very good.
I wonder if array is mapping into global memory so that the performance is poor when using array?
However, I need to use array instead of vector because the algorithm needs to index the ith element "A[i]" in a for loop.
I think it is impossible to use vector in such a way "A.si".
Can anyone help me?
Thank you very much in advance!
Mali GPUs do not have an dedicated local memory for compute shaders.