Cache Coherence for BigLittle CPUs and Arm Mali T628 MP6 GPU on the odroid XU3 board

I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU.  I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.

1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?

2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?
3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?

Parents
  • Hi kiranchandramohan,

    1) The caches on the GPU side will automatically be cleaned / invalidated by the GPU driver as needed, however the CPU caches need to be updated manually: this is why you need to call map / unmap.

    2) That's because the 6 cores of the Mali T628 MP6 are not cache coherent, therefore they appear as a cluster of 4 cores and a second one of 2 cores, each cluster translates into a separate OpenCL device. This is specific to the Mali T628: all the cores in the earlier and later models are cache coherent and therefore will appear as a single OpenCL device.

    3) When using an OpenCL buffer or image backed up by an externally allocated memory allocation then it's the application's responsibility to update the CPU caches (You basically can't call Map / Unmap on such buffers).

    On Android for example you can create an EGLImageKHR from a gralloc buffer then create a cl_image from it using clCreateFromEGLImageKHR.

    Hope this helps,

    Thanks,

    Anthony

Reply
  • Hi kiranchandramohan,

    1) The caches on the GPU side will automatically be cleaned / invalidated by the GPU driver as needed, however the CPU caches need to be updated manually: this is why you need to call map / unmap.

    2) That's because the 6 cores of the Mali T628 MP6 are not cache coherent, therefore they appear as a cluster of 4 cores and a second one of 2 cores, each cluster translates into a separate OpenCL device. This is specific to the Mali T628: all the cores in the earlier and later models are cache coherent and therefore will appear as a single OpenCL device.

    3) When using an OpenCL buffer or image backed up by an externally allocated memory allocation then it's the application's responsibility to update the CPU caches (You basically can't call Map / Unmap on such buffers).

    On Android for example you can create an EGLImageKHR from a gralloc buffer then create a cl_image from it using clCreateFromEGLImageKHR.

    Hope this helps,

    Thanks,

    Anthony

Children