OpenCL: why there is so big difference between the time counter of cl_profiling_info ?

I run an OpenCL task on Mali G-57 GPU, and profile the performance of kernel by code below:

cl_int err = clEnqueueNDRangeKernel(pocl->getCommandQueue(), kernel, 2NULL, globalsize, localsize, 0NULL&event_kernel );
clWaitForEvents(1&event_kernel);
cl_ulong queue_time = 0, submit_time = 0, start_time = 0, end_time = 0;
    err = clGetEventProfilingInfo(event_kernel, CL_PROFILING_COMMAND_QUEUED, sizeof(queue_time), &queue_time, 0);
    checkErr(err, "CL_PROFILING_COMMAND_QUEUED"true);
    err = clGetEventProfilingInfo(event_kernel, CL_PROFILING_COMMAND_SUBMIT, sizeof(queue_time), &submit_time, 0);
    checkErr(err, "CL_PROFILING_COMMAND_SUBMIT"true);
    err = clGetEventProfilingInfo(event_kernel, CL_PROFILING_COMMAND_START, sizeof(start_time), &start_time, 0);
    checkErr(err, "CL_PROFILING_COMMAND_START"true);
    err = clGetEventProfilingInfo(event_kernel, CL_PROFILING_COMMAND_END, sizeof(end_time), &end_time, 0);
    checkErr(err, "CL_PROFILING_COMMAND_END"true);
    printf("time value=%llu %llu %llu %llu", queue_time, submit_time, start_time, end_time);
the log shows:
D/OLOG:cl_success:CL_PROFILING_COMMAND_QUEUED
D/OLOG:cl_success:CL_PROFILING_COMMAND_SUBMIT
D/OLOG:cl_success:CL_PROFILING_COMMAND_START
D/OLOG:cl_success:CL_PROFILING_COMMAND_END
D/OLOG:time value=71608492999037 71608493875422 8517593063107986718 8517593063119119891
We can see that the value of CL_PROFILING_COMMAND_START or CL_PROFILING_COMMAND_END is very bigger than CL_PROFILING_COMMAND_QUEUED or CL_PROFILING_COMMAND_SUBMIT. If we chang these to time, the time cost from CL_PROFILING_COMMAND_SUBMIT to CL_PROFILING_COMMAND_START will be 8517521506304ms. It is impossible.
Can anybody tell me what's the wrong which cause the time counter of the task status is so big? 
More questions in this forum