We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU. I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.
1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?
2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?
So what is the right way to clean/invalidate the gpu caches so that the two gpu devices read the updated data?
I don't think you need to do anything, other than express the dependencies between commands in the CL queues correctly; the driver handles the rest. i.e. GPU cache coherency should be totally transparent to the application.
HTH,
Pete