This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OpenCL Mapped Buffer Map (Unmap) Implementation Behavior

Hello, I'm trying to optimize some OpenCL Code, what we're queueing is 2 Write-Only Buffers Mapped to the Host and 2 Read-Only (Mapped aswell on the host)

The Proposed Simplified Workflow is the following:

- Compile Programs (P1, P2)

- Allocate buffers (W1, W2, R1, R2), initialization, (...)

- Map Buffers to Host

- Fill W1

- Unmap W1, R1

------loop-------

- enqueue P1, with W1 and R1 as Arg into OpenCL Device CmdQueue

- Schedule() <P2 End>

- Map W2, R2

- Fill W2

- Read R2

- Unmap W2, R2

- enqueue P2, with W2 and R2 as Arg into OpenCL Device CmdQueue

- Schedule() <P1 End>

- Map W1, R1

- Fill W1

- Read W1

- Unmap W1, R1

----end loop ---

Is it necessary to unmap buffers from host (and then remap) before a kernel to start using them?. As I can remember from the spec it says that is implementation-defined, I already asked to the board manufacturers (ODROID Forum • View topic - OpenCL Mapped Buffer Map (Unmap) Implementation Behavior) But they told me to ask here.

What would it be the time gained from avoiding enqueueing map and unmap commands for each of it? The point is that kernels run very fast, so those calls get queued petty often.

-- Platform Data --

Board: ODROID-XU3

Processor: Samsung Exynos5422 ARM® Cortex™-A15 Quad 2.0GHz/Cortex™-A7 Quad 1.4GHz

GPU: Mali™-T628 MP6 OpenGL ES 3.0 / 2.0 / 1.1 and OpenCL 1.1 Full profile

(Details: ODROID-XU3)

Best Regards!

  • Hi roosemberth,

    For Mali, it is necessary to map/unmap as in your pseudocode to flush the caches and make sure that the data being written and read is correct.  Were you to only map the buffer once initially it would be possible that if a cache line in the CPU is still present for data in the buffer, but the GPU has written fresh data, the dirty cache line may be read back onto the host.

    In light of this it is not really possible to answer the second question about the time saved by avoiding the mapping/unmapping of buffers as not doing so would potentially invalidate the computed results.

    You can read more about shared memory systems in the Mali OpenCL SDK documentation at http://malideveloper.arm.com/downloads/deved/tutorial/SDK/opencl/memory_buffers_tutorial.html.

    Hope this helps,

    Rich