using NEON with OpenCL data type

I'm having a problem using a OpenCL cl_uchar4 farray as input with NEON intrinsics. I'm using `gQueue.enqueueMapBuffer` to transfer data from the GPU to the CPU buffer using `cl_uchar4 bufligne_512[65536][4]`. Then I try to load four `cl_uchar4` instances at once using `uint8x16_t xv = vld4q_u8(buf_512->bufligne_512[i][0])`, and I get this error: `error: no viable conversion from 'cl_uchar4' to 'const void *'`.

I then tried with `cl_uchar16 bufligne_512[65536]` and `uint8x16_t xv = vld4q_u8(buf_512->bufligne_512[i])`, and I get this error: `error: no viable conversion from 'cl_uchar16' to 'const void *'`. *

There's something I haven't understood yet. It look like i must convert OpenCL data type to something else, but not sure and i do not know in what.

I can't seem to make progress right now. I always feel like I'm missing something.

  • I have found the solution. woke up time ;))

    i declared :

    static void * __restrict__ point_ptr_512    = (void * __restrict__ ) malloc(65536*sizeof(cl_uchar4));

    struct conv_uchar4{
        uint8_t A;
        uint8_t B;
        uint8_t C;
        uint8_t D;
    };
    static conv_uchar4 conv_neuro512[512*512];
    static conv_uchar4 * __restrict__ buf_512 = &conv_neuro512[0];

    then

    void* point_ptr_512 = gQueue.enqueueMapBuffer(buffer512, CL_TRUE, CL_MAP_READ,0,(512*512)*sizeof(cl_uchar4), 0,NULL);
    buf_512 = (conv_uchar4 *)point_ptr_512;

    and

    uint8x16x4_t xv = vld4q_u8(&buf_512->A); or uint8x16_t xv = vld1q_u8(&buf_512->A);

    PS: i hav looked to internet to find some solution yesteday but there is nothing very efficient. Here it work.