We have two Malis on our board Odroid XU4 . We wish to create a large image, with one Mali creating half the image and the other Mali creating the other half. We also want the image to be memory mapped, as it is quite large. Can we map the image in such a way so that both Malis see at least the part they are working on, and, of course, the memory for the whole image is contiguous for the Cpu?
A rephrasing of the question might be: May different devices see the same shared memory with the host?
One possible way (but would like to confirm before moving forward), involves using buffers instead of images:
1. Create entire buffer in the context with CL_MEM_ALLOC_HOST_PTR
2. Create two disjoint sub-buffers. (did not see a way to create sub-images)
3. Map each sub-buffer on its own command queue.
However, this depends on when the memory gets allocated on the host:
Is memory allocated in step 1. or in step 3.? If in step 1, the memory will be contiguous on the host. If in step 3, ... ? I suspect in step 3, since that is when the host ptr becomes available.
Message was edited by: Norman Goldstein
Thanks for the info and pointers. Here is an outline of what I
understand from this:
-- A single context having the two devices: device0 and device1
and two queues: queue0 and queue1
// The float single channel image that we want to generate
-- image = clCreateImage2D( context,
CL_MEM_WRITE_ONLY |
CL_MEM_ALLOC_HOST_PTR,
...,
nullptr, // host ptr
... );
// Map the entire image
-- float* ptr = clEnqueueMapImage( queue0,
image,
After running the kernels, ptr will point to the entire (contiguous)
image, as created by the kernels of the two devices. We could have used
"queue1" instead of "queue0" to do the mapping -- it makes no
difference, due to the Mali memory architecture.
I should have added another sentence to the end of my previous post:
Do I have this correct -- Either queue0 or queue1 can be used to map the entire buffer for the cpu, and in which the cpu will see the results generated by the kernels on both devices?
I'm a graphics guy, not a compute guy, but yes pretty sure that should work. I'll try and find a handy OpenCL driver dev to comment!
Cheers, Pete