Hi everyone,
I'd like to know what happens when I use the command clEnqueueMapBuffer to the hardware level.
All the buffer at CPU-side cache is invalidated?
And when I use the command clEnqueueUnmapMemObject,
All the buffer at GPU-side cache is invalidated?
Thanks!
Hi,
No, unfortunately that's not how CPU caches work: the CPU will have to loop through all the pages of the allocation and invalidate them in the cache (So it will have to go through the 1000 elements).
Note: Map / Unmap only affect CPU caches, the GPU ones are automatically handled by the Mali driver before and after the jobs are run on the GPU.
Thanks,
Anthony
Anthony Barbier,
When to use the buffer shared between CPU and GPU (CL_MEM_ALLOC_HOST_PTR)?
And when do not use?
I did some tests and I found that worth to use in most cases I tested.
Is there any material that describes with more detail how it works at architecture level (clEnqueueMapBuffer() and clEnqueueUnmapMemObject())?
I'm planning to automatically mark the code with these functions (clEnqueueMapBuffer() and clEnqueueUnmapMemObject()) during compile time - optimization.
But before I'm analysing if it worth.
Thanks for helping me!!!
CL_MEM_ALLOC_HOST_PTR is just a hint on the driver, if you don't set it the driver might think you won't access the memory from the host side and therefore might use CPU uncached memory.
In practice the driver might just ignore this flag and cache the memory anyway which is why you might not always see any performance difference.
Map / Unmap are pretty much just some calls in the dma cache maintenance routines of the Linux kernel (I'm sure you can find some information about that in the Linux Kernel documentation if you're interested).