I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU. I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.
1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?
2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?
So what is the right way to clean/invalidate the gpu caches so that the two gpu devices read the updated data?
I don't think you need to do anything, other than express the dependencies between commands in the CL queues correctly; the driver handles the rest. i.e. GPU cache coherency should be totally transparent to the application.