This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Overhead generated by calling clCreateBuffer

Hi everyone,

I'm using OpenCL on an Exynos 8890 Octacore CPU with ARM Mali-T880 MP12 GPU (Samsung S7 edge). And it is taking a high overhead when creating a buffer from the call clCreateBuffer. I'd like to know more about this issue. Is anything related with the driver that takes all this time? Why it takes a long time to create the buffer?

Below are described the example used and the sizes with their respective time. Observe that I'm creating two buffer each one with size of N*N elements of type float.

    #define DATA_TYPE float

    int N = 8192;  

    t_start = rtclock();

#ifdef OFFLOAD

    a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY, sizeof(DATA_TYPE) * N * N, NULL, &errcode);

    b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE, sizeof(DATA_TYPE) * N * N, NULL, &errcode);

#else

    a_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR,  sizeof(DATA_TYPE) * N * N, NULL, &errcode);

    b_mem_obj = clCreateBuffer(clGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, sizeof(DATA_TYPE) * N * N, NULL, &errcode);

#endif

    t_end = rtclock();

    printf("Total time of clCreateBuffer %lf \n" , t_end - t_start);

   

N (size) clCreateBuffer (seconds)
2048 0.010235
4096 0.251183
8192 1.385209
9000 1.622948
10000 2.054119
11000 2.501804

PD. Executing the same program on an Intel GPU doesn't take a long time when compared with the time taken by Mali GPU.

Thanks!!!

Parents Reply Children
  • I've updated the previous message with the correct information about what I did to run in performance mode.

    Briefly, I did:

    • To set performance mode:

              $echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

              $echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

              $echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

              ...

    • After setting performance, I've checked running the following commands:

              $cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

              performance

              $cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor

              performance

              $cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor

              performance

              ...

    Is there anything more I can do to check or reduce the time taken by calling clCreateBuffer?

    The source file that I've used to measure the times is attached.