Is the array object in OpenCL kernel mapping to global memory instead of register?

Hi,

I am trying to implement an OpenCL kernel on G76 with DDK r16.

I find that if I define and use an array like "half A[16];", the performance will be poor.

But if I use "half16 A;", the performance is very good.

I wonder if array is mapping into global memory so that the performance is poor when using array?

However, I need to use array instead of vector because the algorithm needs to index the ith element "A[i]" in a for loop.

I think it is impossible to use vector in such a way "A.si".

Can anyone help me?

Thank you very much in advance!