Hi all,
I am playing with Mali T624 and OpenCL. By playing the kernel-space midgard driver, I am now able to access some I/O memory in the OpenCL kernel. However, the I/O memory I am accessing is volatile. For example, assume we have a kernel function with a 0x1000 size input buffer. The data at offset 0x10 of the buffer would change after each read (each read returns a different value), and I attempt to read 10 times from the offset (would get 10 different values in my expectation). The problem is that the GPU caches the data at the offset, and each read to the offset always returns the same value. So, is there any way to manual flush the GPU cache in the OpenCL kernel code?
Thanks for any help and discussion!
Best Regards,
Zhenyu
No, it's not possible.
As per my other replies, this is not supported use case.
Kind regards, Pete
Hello Peter,
Thanks for explaining! I understand that this is not a normal use case. Actually, I am not supposed to do this from user-level. However, since there are not much manual or document of the GPU and kernel-level driver, I choose to do it with an user-level application and a modified kernel driver. So, basically, I assume that I own at least the kernel privilege. In this case, should flushing the GPU cache be a valid operation? How can I invalidate the GPU cache with the kernel privilege?
Thanks for your help!
> In this case, should flushing the GPU cache be a valid operation?
No, it's not possible to clean or clean+invalidate the GPU caches from a shader during execution.
Manual cache maintenance for the entire cache is triggered at job chain boundaries (e.g. when the GPU starts a task assigned by the driver, or when the GPU finishes a task assigned by the driver). The memory model for GPU workloads is not generally designed to cope with volatile I/O resources - it's a GPU not a CPU ...
Recent Mali GPUs can optionally support full coherency, which use system level hardware coherency protocols, if the silicon chip implements them. However full coherency requires both sides of the link to implement it and I doubt simple IO peripherals will implement coherency protocols even if the CPU and GPU do. Mali-T620 doesn't implement full coherency in either case, so this won't be able to help in this specific case.
You might be able to set up the GPU memory mapping as uncached in the GPU page table, which bypasses the cache completely, but again this isn't something which a GPU is really designed to do so it might have weird side-effects on e.g. performance.
Pete
Thanks again for your explanation! That helps a lot!
View all questions in Graphics and Gaming forum