This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Memory Optimization on Mali GPU

Hi everyone,

Recently I have been working on a GPU application. My application will run on Arndale board and will use Mali GPU. To make program execution faster I wanted to do memory optimization. Based on the OpenCL guide, using CL_MEM_ALLOC_HOST_PTR should be used to improve performance. Using of CL_MEM_USE_HOST_PTR is discouraged.

But from my experiment, I found that using of CL_MEM_USE_HOST_PTR actually reduce data transfer time. but increase kernel execution overhead. From my experiement, I found that data copy is inevitable in both cases (CL_MEM_ALLOC_HOST_PTR and CL_MEM_USE_HOST_PTR).

Can anyone confirm? Is it possible at all to have a zero copy?

It has been said in the mali OpenCL guide that using CL_MEM_ALLOC_HOST_PTR requires no copy. But there is a copy. Let’s say I have a pointer A. I created a buffer using CL_MEM_ALLOC_HOST_PTR. To have the data of A available to the GPU, I have to do a memcpy to transfer data from A to the allocated space I get using CL_MEM_ALLOC_HOST_PTR.

So, data copy is needed. Is there a way to access the data directly from GPU without any copying?

PS: I have attached my code for your feedback.


UPDATE:: I have uploaded a version with HOST_ALLOC_PTR for your review.


This is the code snippet:


   #ifdef mem_alloc_host
   start = getTime();
   a_st=getTime();
   bufferA = clCreateBuffer(context,  CL_MEM_ALLOC_HOST_PTR, sizeof(cl_float) * ELE_NUM, NULL, &err);
   cl_float* src_a=(cl_float*)clEnqueueMapBuffer(commandQueue, bufferA,CL_TRUE,CL_MAP_WRITE, 0, sizeof(cl_float) * ELE_NUM, 0, NULL, NULL, &err);
   bufferB = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, sizeof(cl_float) * ELE_NUM, NULL, &err);
   cl_float* src_b=(cl_float*)clEnqueueMapBuffer(commandQueue, bufferB,CL_TRUE,CL_MAP_WRITE, 0, sizeof(cl_float) * ELE_NUM, 0, NULL, NULL, &err);
   clFinish(commandQueue);
   a_en=getTime();
   a_time=a_time+(a_en-a_st);
   pfill_s=getTime();
   for (int i = 0; i < ELE_NUM; i++){
   src_a[i] = 100.0;
   src_b[i] = 11.1;
   }
   pfill_e=getTime();
   pfill_time=pfill_time+(pfill_e-pfill_s);
   b_st=getTime();
   clEnqueueUnmapMemObject(commandQueue, bufferB, src_b, 0, NULL, NULL);
   clEnqueueUnmapMemObject(commandQueue, bufferA, src_a, 0, NULL, NULL);
   clFinish(commandQueue);
   b_en=getTime();
   b_time=b_time+(b_en-b_st);
   end = getTime();
   creat_buffer += (end-start);
   bufferC = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, sizeof(cl_float) * ELE_NUM, NULL, &err);

#endif

6176.zip
Parents Reply Children
  • Hi veerannah,


    It is worth remembering that the Arndale is a development board and is designed to be pushed to its limits. What you do on this hardware, may not work well on production devices due to different thermal and power constraints for each device.

    With that in mind, you can do things such as disable DVFS or clock the CPU/GPU etc to the highest supported frequencies. Since it is a non-battery powered device, power constraints are relaxed, but you still have thermal limitations to deal with... this can even be stricter than a production device since the SoC is exposed on a devboard and not helped by the form factor of a production device.

    In order to protect itself, even when DVFS has been disabled, when you reach a thermal limit that is deemed unsafe for normal operations, the SoC will start underclocking itself to try deal with the excess heat.

    You can do things to try help this along, if what you are interested in is to stress test the theoretical limitations of the hardware... such as using a heatsink, even a fan, or even going extreme with liquid cooling etc.

    Obviously, if you are interested in real world performance, this is not really the thing you should be pursuing, but rather you should be looking at ways you could optimise your code to reduce power consumption, which in turn will decrease the thermal issue.

    An example is bandwidth... by reducing bandwidth, you not only save power consumption, but also a lot on heat as well...

    If you have any further questions, please do let us know.

    Kind Regards,

    Michael McGeagh

  • Hi Michael,

    Thanks for the detailed reply.