This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Buffer create taking 10 ms on mali G-72

Hi,

I am working on a video solution code. where I have to provide source image to GPU and do computation and write in the destination. I read that using buffer creates in the loop every time will add GPU overhead.so, I implemented the following. but, still, I am facing the same performance issue. can someone help me?

//create a buffer for source and destination

clCreateBuffer(context_,CL_MEM_READ_WRITE|CL_MEM_ALLOC_HOST_PTR, mem_size, NULL, &error_code);

while(recording){

 clEnqueueWriteBuffer(queue_,buffer_ptr,CL_TRUE,0,memsize ,src_ptr,0, NULL,NUL): 

global_size=(dst_w,dst_h); 

clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel);

 clEnqueueReadBuffer(queue_,buffer_dst_ptr,CL_TRUE,0,memsize,dst_y ,0, NULL,NULL);

}

My kernel is completely simple.It has very minimal computation.you can consider like its just copying image from source to desitnation.because of power constraints.I have to do it on GPU only.I know memory transfer overhead is there,but unable to find how to reduce it.

Parents
  • Hi Tarun, 

    Firstly, why do you think the GPU is going to be lower power than the CPU for a simple memory copy operation? That seems like a big assumption which is unlikely to be true in practice; the DRAM access energy is going to be the most expensive aspect of that and will dwarf any logic energy cost in the CPU. 

    The main problem with using the GPU for this is that you are likely using  memory that is cached on the CPU to back the buffers. On devices without hardware CPU-to-GPU memory coherency (few do), the drivers will have to do manual cache maintenance when passing a buffer to the GPU (clean) and when reading back (invalidate) the result. Manual set-way cache maintenance is never fast for large buffers.

    Cheers,
    Pete

Reply
  • Hi Tarun, 

    Firstly, why do you think the GPU is going to be lower power than the CPU for a simple memory copy operation? That seems like a big assumption which is unlikely to be true in practice; the DRAM access energy is going to be the most expensive aspect of that and will dwarf any logic energy cost in the CPU. 

    The main problem with using the GPU for this is that you are likely using  memory that is cached on the CPU to back the buffers. On devices without hardware CPU-to-GPU memory coherency (few do), the drivers will have to do manual cache maintenance when passing a buffer to the GPU (clean) and when reading back (invalidate) the result. Manual set-way cache maintenance is never fast for large buffers.

    Cheers,
    Pete

Children