This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL on Mali, MapBuffers and lifetime of void* pointers

Hi,

i'm porting an existing OpenCL-using application to ARM/Mali. It already runs, but performance could be better due to unneeded buffer copies.

The ideal OpenCL workflow seems to be:

Init: create a cl_mem object with CL_MEM_ALLOC_HOST_PTR .

Loop:

1. get a void* via clEnqueueMapBuffer

2. Use the void* to fill in data

3. UnMap

4. use the cl_mem object as parameter for a kernel.

but i have a problem in step 1: do i always get the same void* or can this pointer move over time? I basically need this:

Init 1: create a cl_mem object with CL_MEM_ALLOC_HOST_PTR .

Init 2: get a void* via clEnqueueMapBuffer

Init 3: pass the void* into a device driver (very expensive operation)

Loop:

1. wait for the device driver to fill the buffer

2. UnMap

3. use the cl_mem object as parameter for a kernel.

4. clEnqueueMapBuffer, but ignore the new void* because Init 3 is very slow.

Can i really ignore the pointer returned in step 4 and use the first pointer returned in init 3 forever? Is this guaranteed for all Mali-OpenCL implementations?

Thanks

Parents
  • Passing data from the host to the GPU is expensive primarily because of the need for CPU-side cache maintenance on platforms without some form of hardware cache coherency between the GPU and the CPU. This cost is going to be unavoidable even if the actual pointer is the same - if you don't flush the CPU caches on data exchange then you risk data corruption because the CPU and GPU views of the data are out of synchronization.

    The best workaround here is to pipeline the GPU processing and the CPU processing. Have two (or more) buffers, and while the CPU is processing and setting up one, have the GPU processing another.

    HTH, 
    Pete

Reply
  • Passing data from the host to the GPU is expensive primarily because of the need for CPU-side cache maintenance on platforms without some form of hardware cache coherency between the GPU and the CPU. This cost is going to be unavoidable even if the actual pointer is the same - if you don't flush the CPU caches on data exchange then you risk data corruption because the CPU and GPU views of the data are out of synchronization.

    The best workaround here is to pipeline the GPU processing and the CPU processing. Have two (or more) buffers, and while the CPU is processing and setting up one, have the GPU processing another.

    HTH, 
    Pete

Children