I used opencl1.1 on Mali628(Exynos5422)。
1.
first I create a buffer
buffer = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, 1280*720*4, NULL, &errorNumber);
next call the kernel to handle the buffer and waiting the command queue finish.
finally, map the buffer to cpu
*pCPU = (unsigned char *)clEnqueueMapBuffer(command_queue,buffer,
CL_TRUE,
CL_MAP_READ,
0,
1280*720*4,
0, NULL, NULL, &errorNumber);
the process buffer map to cpu takes 1843us
2.
change CL_MAP_READ to CL_MAP_WRITE
CL_MAP_WRITE,
I though it will save time , but it still takes 1891us
In my opinion, CL_MEM_ALLOC_HOST_PTR mean map/umap the buffer will only translate the pointer (may been flush the cache) but not to copy memory.why it cost a long time.
thx!
Hi,
You're right map/unmap doesn't copy any memory, what takes time is the CPU cache maintenance.
In order to make sure this is done as quickly as possible make sure your CPUs are set to run in performance mode:
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Hope this helps,
Anthony
Thank you for your answer.
I had check the config file:
$cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu7/cpufreq/scaling_governor