This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL on Mali, MapBuffers and lifetime of void* pointers

Hi,

i'm porting an existing OpenCL-using application to ARM/Mali. It already runs, but performance could be better due to unneeded buffer copies.

The ideal OpenCL workflow seems to be:

Init: create a cl_mem object with CL_MEM_ALLOC_HOST_PTR .

Loop:

1. get a void* via clEnqueueMapBuffer

2. Use the void* to fill in data

3. UnMap

4. use the cl_mem object as parameter for a kernel.

but i have a problem in step 1: do i always get the same void* or can this pointer move over time? I basically need this:

Init 1: create a cl_mem object with CL_MEM_ALLOC_HOST_PTR .

Init 2: get a void* via clEnqueueMapBuffer

Init 3: pass the void* into a device driver (very expensive operation)

Loop:

1. wait for the device driver to fill the buffer

2. UnMap

3. use the cl_mem object as parameter for a kernel.

4. clEnqueueMapBuffer, but ignore the new void* because Init 3 is very slow.

Can i really ignore the pointer returned in step 4 and use the first pointer returned in init 3 forever? Is this guaranteed for all Mali-OpenCL implementations?

Thanks

Parents
  • If what you ultimately want to do is process an image from a camera driver, it should be possible to achieve zero-copy provided the camera driver supports dma_buf (so including Android Ion allocations). The Mali driver supports a proprietary extension (https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_memory.txt) that allows to create a CL buffer to wrap a dma_buf allocation.

    The cache maintenance is done on map/unmap buffer, right?

    Correct. As Pete is saying, there is no way around the CPU cache maintenance cost as long as the buffer is touched by the CPU on a platform without CPU/GPU cache coherency. However, if your platform supports IO-Coherency (enabled by CL_MEM_ALLOC_HOST_PTR), some cache maintenance can be avoided. Similarly, if the platform supports full hardware coherency and OpenCL 2.0, using fine-grain SVM allocations is likely to provide a performance benefit as no cache maintenance at all should be required.

Reply
  • If what you ultimately want to do is process an image from a camera driver, it should be possible to achieve zero-copy provided the camera driver supports dma_buf (so including Android Ion allocations). The Mali driver supports a proprietary extension (https://www.khronos.org/registry/OpenCL/extensions/arm/cl_arm_import_memory.txt) that allows to create a CL buffer to wrap a dma_buf allocation.

    The cache maintenance is done on map/unmap buffer, right?

    Correct. As Pete is saying, there is no way around the CPU cache maintenance cost as long as the buffer is touched by the CPU on a platform without CPU/GPU cache coherency. However, if your platform supports IO-Coherency (enabled by CL_MEM_ALLOC_HOST_PTR), some cache maintenance can be avoided. Similarly, if the platform supports full hardware coherency and OpenCL 2.0, using fine-grain SVM allocations is likely to provide a performance benefit as no cache maintenance at all should be required.

Children
No data