Hi everyone,
I'm using OpenCL on an Exynos 8890 Octacore CPU with ARM Mali-T880 MP12 GPU (Samsung S7 edge). And it is taking a high overhead when creating a buffer from the call clCreateBuffer. I'd like to know more about this issue. Is anything related with the driver that takes all this time? Why it takes a long time to create the buffer?
Below are described the example used and the sizes with their respective time. Observe that I'm creating two buffer each one with size of N*N elements of type float.
#define DATA_TYPE float
int N = 8192;
t_start = rtclock();
#ifdef OFFLOAD
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#else
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#endif
t_end = rtclock();
printf("Total time of clCreateBuffer %lf \n" , t_end - t_start);
PD. Executing the same program on an Intel GPU doesn't take a long time when compared with the time taken by Mali GPU.
Thanks!!!
Hi Anthony Barbier,
Running it with performance flag didn't improve almost nothing.
I've used the following commands to set performance:
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
...
After setting performance, I've checked:
$cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
I'd appreciate any addition feedback.
Thanks.
"cat" is to read, not to write
If you do
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Does it say the processor is in performance mode ?
If not you need to do
I've updated the previous message with the correct information about what I did to run in performance mode.
Briefly, I did:
$echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
Is there anything more I can do to check or reduce the time taken by calling clCreateBuffer?
The source file that I've used to measure the times is attached.