Cache Coherence for BigLittle CPUs and Arm Mali T628 MP6 GPU on the odroid XU3 board

I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU.  I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.

1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?

2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?
3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?

Parents
  • Hello abarbier,

    Thanks for the informative quick reply.

    1) You say that the GPU driver will automatically clean the caches. Does this happen when clReleaseMemObject function is called ? Or does it happen by default bcos the GPU caches are writethrough.

    2) So what is the right way to clean/invalidate the gpu caches so that the two gpu devices read the updated data ?

    3) I am using OpenCL buffers created with the CL_MEM_ALLOC_HOST_PTR flag. And then map this so that it can be used by the CPU. And then use map and unmap to clean/invalidate the caches. Also how can the cache location corresponding to an OpenCL buffer be updated without using map/unmap openCL ?

    --Kiran

Reply
  • Hello abarbier,

    Thanks for the informative quick reply.

    1) You say that the GPU driver will automatically clean the caches. Does this happen when clReleaseMemObject function is called ? Or does it happen by default bcos the GPU caches are writethrough.

    2) So what is the right way to clean/invalidate the gpu caches so that the two gpu devices read the updated data ?

    3) I am using OpenCL buffers created with the CL_MEM_ALLOC_HOST_PTR flag. And then map this so that it can be used by the CPU. And then use map and unmap to clean/invalidate the caches. Also how can the cache location corresponding to an OpenCL buffer be updated without using map/unmap openCL ?

    --Kiran

Children