Hi everyone,
I'd like to know what happens when I use the command clEnqueueMapBuffer to the hardware level.
All the buffer at CPU-side cache is invalidated?
And when I use the command clEnqueueUnmapMemObject,
All the buffer at GPU-side cache is invalidated?
Thanks!
Anthony Barbier,
When to use the buffer shared between CPU and GPU (CL_MEM_ALLOC_HOST_PTR)?
And when do not use?
I did some tests and I found that worth to use in most cases I tested.
Is there any material that describes with more detail how it works at architecture level (clEnqueueMapBuffer() and clEnqueueUnmapMemObject())?
I'm planning to automatically mark the code with these functions (clEnqueueMapBuffer() and clEnqueueUnmapMemObject()) during compile time - optimization.
But before I'm analysing if it worth.
Thanks for helping me!!!
CL_MEM_ALLOC_HOST_PTR is just a hint on the driver, if you don't set it the driver might think you won't access the memory from the host side and therefore might use CPU uncached memory.
In practice the driver might just ignore this flag and cache the memory anyway which is why you might not always see any performance difference.
Map / Unmap are pretty much just some calls in the dma cache maintenance routines of the Linux kernel (I'm sure you can find some information about that in the Linux Kernel documentation if you're interested).