I'm having a problem using a OpenCL cl_uchar4 farray as input with NEON intrinsics. I'm using `gQueue.enqueueMapBuffer` to transfer data from the GPU to the CPU buffer using `cl_uchar4 bufligne_512[65536][4]`. Then I try to load four `cl_uchar4` instances at once using `uint8x16_t xv = vld4q_u8(buf_512->bufligne_512[i][0])`, and I get this error: `error: no viable conversion from 'cl_uchar4' to 'const void *'`.I then tried with `cl_uchar16 bufligne_512[65536]` and `uint8x16_t xv = vld4q_u8(buf_512->bufligne_512[i])`, and I get this error: `error: no viable conversion from 'cl_uchar16' to 'const void *'`. *There's something I haven't understood yet. It look like i must convert OpenCL data type to something else, but not sure and i do not know in what.
I can't seem to make progress right now. I always feel like I'm missing something.
I have found the solution. woke up time ;))
i declared :
static void * __restrict__ point_ptr_512 = (void * __restrict__ ) malloc(65536*sizeof(cl_uchar4));struct conv_uchar4{ uint8_t A; uint8_t B; uint8_t C; uint8_t D;};static conv_uchar4 conv_neuro512[512*512];static conv_uchar4 * __restrict__ buf_512 = &conv_neuro512[0];
thenvoid* point_ptr_512 = gQueue.enqueueMapBuffer(buffer512, CL_TRUE, CL_MAP_READ,0,(512*512)*sizeof(cl_uchar4), 0,NULL);buf_512 = (conv_uchar4 *)point_ptr_512;
and
uint8x16x4_t xv = vld4q_u8(&buf_512->A); or uint8x16_t xv = vld1q_u8(&buf_512->A);
PS: i hav looked to internet to find some solution yesteday but there is nothing very efficient. Here it work.