Support forums

Graphics, Gaming, and VR forum Buffer create taking 10 ms on mali G-72

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 7 replies
Answers 1 answer
Subscribers 137 subscribers
Views 23291 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Buffer create taking 10 ms on mali G-72

Tarun Annapareddy over 4 years ago

Hi,

I am working on a video solution code. where I have to provide source image to GPU and do computation and write in the destination. I read that using buffer creates in the loop every time will add GPU overhead.so, I implemented the following. but, still, I am facing the same performance issue. can someone help me?

//create a buffer for source and destination

clCreateBuffer(context_,CL_MEM_READ_WRITE|CL_MEM_ALLOC_HOST_PTR, mem_size, NULL, &error_code);

while(recording){

clEnqueueWriteBuffer(queue_,buffer_ptr,CL_TRUE,0,memsize ,src_ptr,0, NULL,NUL):

global_size=(dst_w,dst_h);

clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel);

clEnqueueReadBuffer(queue_,buffer_dst_ptr,CL_TRUE,0,memsize,dst_y ,0, NULL,NULL);

}

My kernel is completely simple.It has very minimal computation.you can consider like its just copying image from source to desitnation.because of power constraints.I have to do it on GPU only.I know memory transfer overhead is there,but unable to find how to reduce it.

Top replies

Peter Harris over 4 years ago +1 verified

Hi Tarun, Firstly, why do you think the GPU is going to be lower power than the CPU for a simple memory copy operation? That seems like a big assumption which is unlikely to be true in practice; the...

Parents

0 Peter Harris over 4 years ago in reply to Tarun Annapareddy

> is it the actual time or some kind of GPU overhead and how to overcome those?

As I said in my first post, it is likely that the majority of the time is going to be related to cache maintenance, synchronizing the CPU cache and main memory. This will have to happen for any memory that is cached on the CPU if your system-on-a-chip doesn't support hardware cache coherency.
Cancel
Up 0 Down

Cancel

Reply

0 Peter Harris over 4 years ago in reply to Tarun Annapareddy

> is it the actual time or some kind of GPU overhead and how to overcome those?

As I said in my first post, it is likely that the majority of the time is going to be related to cache maintenance, synchronizing the CPU cache and main memory. This will have to happen for any memory that is cached on the CPU if your system-on-a-chip doesn't support hardware cache coherency.
Cancel
Up 0 Down

Cancel

Children

0 Tarun Annapareddy over 4 years ago in reply to Peter Harris

Hi Peter,

I am working on Exynos 9611.is there a way to check, if it provides cache coherency? I want to get confirmation to move forward with that assumption. can you comment on my question regarding " cl_arm_import_memory" also?
Cancel
Up 0 Down

Cancel