Hi everyone,
I'm using OpenCL on an Exynos 8890 Octacore CPU with ARM Mali-T880 MP12 GPU (Samsung S7 edge). And it is taking a high overhead when creating a buffer from the call clCreateBuffer. I'd like to know more about this issue. Is anything related with the driver that takes all this time? Why it takes a long time to create the buffer?
Below are described the example used and the sizes with their respective time. Observe that I'm creating two buffer each one with size of N*N elements of type float.
#define DATA_TYPE float
int N = 8192;
t_start = rtclock();
#ifdef OFFLOAD
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#else
a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);
#endif
t_end = rtclock();
printf("Total time of clCreateBuffer %lf \n" , t_end - t_start);
PD. Executing the same program on an Intel GPU doesn't take a long time when compared with the time taken by Mali GPU.
Thanks!!!
Hi,
When you create a buffer the driver needs to map the corresponding pages, then do some cache maintenance and zero these pages (Which is where all this time goes), however on some platforms these operations don't trigger the CPU governor and therefore are all performed with the CPU running at the minimal frequency.
So make sure your device is running with the CPU in performance mode and it should be much quicker.
I would expect N=10000 to take about 300ms (I just tried on a Samsung Chromebook).
Hope this helps,
Anthony
"cat" is to read, not to write
If you do
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Does it say the processor is in performance mode ?
If not you need to do
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Hi Anthony Barbier,
Running it with performance flag didn't improve almost nothing.
I've used the following commands to set performance:
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
...
After setting performance, I've checked:
$cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
$cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
I'd appreciate any addition feedback.
Thanks.
I've updated the previous message with the correct information about what I did to run in performance mode.
Briefly, I did:
$echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
$echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
Is there anything more I can do to check or reduce the time taken by calling clCreateBuffer?
The source file that I've used to measure the times is attached.