Hi everyone,
I'd like to know what happens when I use the command clEnqueueMapBuffer to the hardware level.
All the buffer at CPU-side cache is invalidated?
And when I use the command clEnqueueUnmapMemObject,
All the buffer at GPU-side cache is invalidated?
Thanks!
Hi rafaelsousa,
In the Mali driver the memory gets mapped on allocation and remains mapped for the entire lifetime of the allocation, therefore you are right: when clEnqueueMapBuffer gets executed the CPU caches gets invalidated and the same happens when Unmapping.
Note: unless the Map is a blocking map the cache maintenance will actually happen when the command queue gets flushed not when this command gets enqueued.
Hope this helps,
Thanks,
Anthony
Anthony Barbier,
When I do clEnqueueMapBuffer, only the data set that was modified at GPU is invalidated at the CPU?
There is one list that maintain all the address of the data set that was modified?
For example, supose one vector of 1000 elements, the CPU modify the first 100 elements, and then call the clEnqueueUnmapBuffer. It means that only that 100 positions will be invalidated at the GPU-cache?
Thanks!!
Hi,
No, unfortunately that's not how CPU caches work: the CPU will have to loop through all the pages of the allocation and invalidate them in the cache (So it will have to go through the 1000 elements).
Note: Map / Unmap only affect CPU caches, the GPU ones are automatically handled by the Mali driver before and after the jobs are run on the GPU.
When to use the buffer shared between CPU and GPU (CL_MEM_ALLOC_HOST_PTR)?
And when do not use?
I did some tests and I found that worth to use in most cases I tested.
Is there any material that describes with more detail how it works at architecture level (clEnqueueMapBuffer() and clEnqueueUnmapMemObject())?
I'm planning to automatically mark the code with these functions (clEnqueueMapBuffer() and clEnqueueUnmapMemObject()) during compile time - optimization.
But before I'm analysing if it worth.
Thanks for helping me!!!
CL_MEM_ALLOC_HOST_PTR is just a hint on the driver, if you don't set it the driver might think you won't access the memory from the host side and therefore might use CPU uncached memory.
In practice the driver might just ignore this flag and cache the memory anyway which is why you might not always see any performance difference.
Map / Unmap are pretty much just some calls in the dma cache maintenance routines of the Linux kernel (I'm sure you can find some information about that in the Linux Kernel documentation if you're interested).