I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU. I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.
1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?
2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?