Hi,
I am working on a video solution code. where I have to provide source image to GPU and do computation and write in the destination. I read that using buffer creates in the loop every time will add GPU overhead.so, I implemented the following. but, still, I am facing the same performance issue. can someone help me?
//create a buffer for source and destination
clCreateBuffer(context_,CL_MEM_READ_WRITE|CL_MEM_ALLOC_HOST_PTR, mem_size, NULL, &error_code);
while(recording){
clEnqueueWriteBuffer(queue_,buffer_ptr,CL_TRUE,0,memsize ,src_ptr,0, NULL,NUL):
global_size=(dst_w,dst_h);
clEnqueueNDRangeKernel(queue_, kernel, 2, NULL, global_size, NULL, 0, NULL, &event_kernel);
clEnqueueReadBuffer(queue_,buffer_dst_ptr,CL_TRUE,0,memsize,dst_y ,0, NULL,NULL);
}
My kernel is completely simple.It has very minimal computation.you can consider like its just copying image from source to desitnation.because of power constraints.I have to do it on GPU only.I know memory transfer overhead is there,but unable to find how to reduce it.
Hi Peter,
I understood your suggestion. I will test CPU power consumption. I have one more doubt..will G-72 support cl_arm_import_memory efficiently. When I tried implementing this instead of clcreatebuffer. It's also taking around the same time. According to my understanding, cl_arm_import_memory will map the data instead of copying it to the device . but how do map and copy both taking the same time?is it the actual time or some kind of GPU overhead and how to overcome those?
> is it the actual time or some kind of GPU overhead and how to overcome those?As I said in my first post, it is likely that the majority of the time is going to be related to cache maintenance, synchronizing the CPU cache and main memory. This will have to happen for any memory that is cached on the CPU if your system-on-a-chip doesn't support hardware cache coherency.
I am working on Exynos 9611.is there a way to check, if it provides cache coherency? I want to get confirmation to move forward with that assumption. can you comment on my question regarding " cl_arm_import_memory" also?