This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Synchronization between the host and kernel

I am using Mali-T764 (MediaTek RK3288) developing a program. I have a problem of the synchronization between the host (CPU) and my kernel running on Mali-T764. In my program, the host does some calculations on an array, which is shared with the kernel. The kernel should be started after the host finishes its operations on the array, and passes it to the kernel, so I use an user event and clSetUserEventStatus() to control the process. However, it does not work. My code is below:

//initializing the program
...

a=(cl_float*)clEnqueueMapBuffer(cmdQueue,buffer_a,CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0,sizeof(cl_float)*N,0,NULL,NULL,NULL);
err=clSetKernelArg(test,0,sizeof(cl_mem),&buffer_a);
cl_event userEvt=clCreateUserEvent(context,&err);

for(int x=0;x<N;x++)
*(a+x)=x;

clSetUserEventStatus(userEvt,CL_COMPLETE);
err=clEnqueueNDRangeKernel(cmdQueue,test,2,NULL,globalWorkSize,localWorkSize,1,&userEvt,&fEvt);
clWaitForEvents(1,&fEvt);

//the kernel
const char*test[] = {

                   "__kernel void test (__global float * a, __global float *d ,__global float * out)\n"
                   " {\n"
                            ...
     "}\n"
         };

In this program, the value of array “a” load into the kernel “test” is wrong. It seems the kernel started before the host finishing the operation on array “a”. However, if I add an sleep() after the for loop, the value of array “a” load into the kernel is correct. The modified code is below:

//initializing the program
...

a=(cl_float*)clEnqueueMapBuffer(cmdQueue,buffer_a,CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0,sizeof(cl_float)*N,0,NULL,NULL,NULL);
err=clSetKernelArg(test,0,sizeof(cl_mem),&buffer_a);
cl_event userEvt=clCreateUserEvent(context,&err);
for(int x=0;x<8;x++)
*(a+x)=x;
Sleep(30);
clSetUserEventStatus(userEvt,CL_COMPLETE);
err=clEnqueueNDRangeKernel(cmdQueue,test,2,NULL,globalWorkSize,localWorkSize,1,&userEvt,&fEvt);
clWaitForEvents(1,&fEvt);

I also tried deleting the clSetUserEventStatus(userEvt,CL_COMPLETE) function, and it leads the program running into a dead lock state, waiting forever for the userEvt completed to start the test kernel to pass the clWaitForEvents(). I was confused. It seems the host run clSetUserEventStatus(userEvt,CL_COMPLETE) before it finished the for loop, though the sequence of them does not suggest this result. Could anyone please tell me what is wrong with my code?

I was wondering how to synchronize the host and the kernel, and how to force a kernel started after a certain point in the host. I would be grateful if anyone could help me figure this out?

Many thanks!

Tan

  • Hi Tan,

      Could you try unmapping the buffer before enqueuing the kernel? Something like this:

    a=(cl_float*)clEnqueueMapBuffer(cmdQueue,buffer_a,CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,0,sizeof(cl_float)*N,0,NULL,NULL,NULL);
    for(int x=0;x<N;x++)
    *(a+x)=x;
    
    clEnqueueUnmapMemObject(cmdQueue, buffer_a, a, 0, NULL, NULL);
    
    
    err=clSetKernelArg(test,0,sizeof(cl_mem),&buffer_a);
    cl_event userEvt=clCreateUserEvent(context,&err);
    ...
    

      An unmap is needed to flush out the cache to memory. Please refer to the SDK here: Mali OpenCL SDK v1.1.0: Memory Buffers

    Neil