This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Questions about kernel queued time and profiling info in mali T764

I am developing a project based on Mali T764 (RK3288). I have problems in the kernel queued time and the function clGetEventProfilingInfo() seems returning wrong value. My codes are below:

cl_ulong time_begin[20], time_finish[20],time_total[20];
//...initializing

cmdQueue=clCreateCommandQueue(context,devices[0],CL_QUEUE_PROFILING_ENABLE,&err);

err=clEnqueueNDRangeKernel(cmdQueue,k1,2,NULL,globalWorkSize,localWorkSize,0,NULL,&Evt1[0]);

err=clEnqueueNDRangeKernel(cmdQueue,k2,2,NULL,globalWorkSize,localWorkSize,0,NULL,&Evt2[0]); err=clEnqueueNDRangeKernel(cmdQueue,ki2,2,NULL,globalWorkSize,localWorkSize,0,NULL,&iEvt2[0]);

err=clEnqueueNDRangeKernel(cmdQueue,ki1,2,NULL,globalWorkSize,localWorkSize,0,&iEvt1[0],&iEvt1[0]);

err=clEnqueueNDRangeKernel(cmdQueue,k1,2,NULL,globalWorkSize,localWorkSize,0,NULL,&Evt1[1]);

err=clEnqueueNDRangeKernel(cmdQueue,k2,2,NULL,globalWorkSize,localWorkSize,0,NULL,&Evt2[1]);

err=clEnqueueNDRangeKernel(cmdQueue,ki1,2,NULL,globalWorkSize,localWorkSize,0,NULL, &iEvt1[5]);

clFinish(cmdQueue);

for(int xx=0;xx<6;xx++)

{

         timeCount(Evt1[xx],&time_begin[xx*4],&time_finish[xx*4]);

         timeCount(Evt2[xx],&time_begin[xx*4+1],&time_finish[xx*4+1]);

         timeCount(iEvt2[xx],&time_begin[xx*4+2],&time_finish[xx*4+2]);

timeCount(iEvt1[xx],&time_begin[xx*4+3],&time_finish[xx*4+3]);

         time_total[xx*4]=(time_finish[xx*4]-time_begin[xx*4]);

         time_total[xx*4+1]=(time_finish[xx*4+1]-time_begin[xx*4+1]);

         time_total[xx*4+2]=(time_finish[xx*4+2]-time_begin[xx*4+2]);

         time_total[xx*4+3]=(time_finish[xx*4+3]-time_begin[xx*4+3]);           

}

void timeCount(cl_event event,cl_ulong *time_begin,cl_ulong *time_finish)
{
*time_begin=0;
*time_finish=0;
err=clGetEventProfilingInfo(event,CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong ),time_begin,NULL);
//err=clGetEventProfilingInfo(event,CL_PROFILING_COMMAND_START,sizeof(cl_ulong) ,time_begin,NULL);
err=clGetEventProfilingInfo(event,CL_PROFILING_COMMAND_END,sizeof(cl_ulong),t ime_finish,NULL);
}

These k1,k2,ik1,ik2 kernels do similar calculations and the execution time of them are supposed to be about 1ms. After running this program, I get weird result.

  1. If I use clGetEventProfilingInfo(event,CL_PROFILING_COMMAND_START,sizeof(cl_ulong) ,time_begin,NULL), the time_begin[] I get has the same value as the time_finish[], and the time_total[] array is all 0. If I use clGetEventProfilingInfo(event,CL_PROFILING_COMMAND_QUEUED,sizeof(cl_ulong ),time_begin,NULL), the return value is fine, so the phenomenon below are all based on using CL_PROFILING_COMMAND_QUEUED.
  2. In the time_finish[] array , some elements with smaller index have the larger value than the one with bigger index. For example, time_finish[2] < time_finish[1]. However, I do not use CL_QUEUE_OUT_OF_ORDER_EXEC_MODE. All kernels should be run in sequence, and all elements in the time_finish[] array are supposed to be increased.
  3. 3. In the time_total[] array, I get

time_total[0]=1838000 ns

time_total[1]=2096000 ns

time_total[2]=75492000 ns

time_total[3]=26357000 ns

time_total[2]=199767000 ns

time_total[3]=228629000 ns

The first two elements may be correct, but the following results are unreasonable. There is no other program running on the system, and a similar kernel shouldn’t be queued so long time before it executed. I also tried other method to measure the whole execution time. It takes about 400ms for all kernels executed, which is 10 times more than it supposed.

My questions are

  1. Is there anyway that I can reduced the kernel queued time in Mali system? Some kernel’s queued time is even more than the time that all kernels are supposed to be finished.
  2. How can I get the correct value of clGetEventProfilingInfo(), especially using CL_PROFILING_COMMAND_START.

I will be very grateful if anyone could help me figure this out.