Hi everyone,
I'm using OpenCL on an Exynos 8890 Octacore CPU with ARM Mali-T880 MP12 GPU (Samsung S7 edge). And it is taking a high overhead when creating a buffer from the call clCreateBuffer. I'd like to know more about this issue. Is anything related with the driver that takes all this time? Why it takes a long time to create the buffer?
Below are described the example used and the sizes with their respective time. Observe that I'm creating two buffer each one with size of N*N elements of type float.
#define DATA_TYPE float
int N = 8192;
t_start = rtclock();
#ifdef OFFLOAD
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#else
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#endif
t_end = rtclock();
printf("Total time of clCreateBuffer %lf \n" , t_end - t_start);
PD. Executing the same program on an Intel GPU doesn't take a long time when compared with the time taken by Mali GPU.
Thanks!!!
I've updated the previous message with the correct information about what I did to run in performance mode.
Briefly, I did:
$echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
...
$cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
Is there anything more I can do to check or reduce the time taken by calling clCreateBuffer?
The source file that I've used to measure the times is attached.