I used opencl1.1 on Mali628(Exynos5422)。
1.
first I create a buffer
buffer = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, 1280*720*4, NULL, &errorNumber);
next call the kernel to handle the buffer and waiting the command queue finish.
finally, map the buffer to cpu
*pCPU = (unsigned char *)clEnqueueMapBuffer(command_queue,buffer,
CL_TRUE,
CL_MAP_READ,
0,
1280*720*4,
0, NULL, NULL, &errorNumber);
the process buffer map to cpu takes 1843us
2.
change CL_MAP_READ to CL_MAP_WRITE
CL_MAP_WRITE,
I though it will save time , but it still takes 1891us
In my opinion, CL_MEM_ALLOC_HOST_PTR mean map/umap the buffer will only translate the pointer (may been flush the cache) but not to copy memory.why it cost a long time.
thx!
Thank you for your answer.
I had check the config file:
$cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor