Cache Coherence for BigLittle CPUs and Arm Mali T628 MP6 GPU on the odroid XU3 board

I am using the odroid XU3 board. It has the Samsung Exynos5422 SoC. The SoC has BigLittle CPUs and the Arm Mali T628 MP6 GPU.  I would like to run the CPU and the GPU in parallel on different sections of the array that I am processing. Currently to enforce coherency I have to use clEnqueueMapBuffer and clEnqueueUnmapMemObject. Usage of these OpenCL functions gives a performance degradation and due to this running the CPU and GPU in parallel becomes pointless. I have the following questions.

1) Are the caches of the ARM CPU and the GPU on this SoC coherent ?

2) The GPU shows up as two devices in OpenCL. Are these GPUs cache coherent ?
3) Is there anyway to enforce coherency other than using MapBuffer and UnmapMemObject CL functions ?

Parents
  • Hi kubussz,

    "This was a temporary solution for the demand of more cores" and was not intended to improve performance?

    I disagree with that statement. Having 2 clusters does increase performance.

    Graphics, the primary use-case of a GPU, can utilise both clusters without issue and thus having 2 vs 1 gives you increased performance.

    Computer however will only utilise 1 cluster by default, unless you tell it to utilise both 'cl devices' (clusters), in which case you will have increased performance.

    By increased performance, I am comparing with a single cluster version such as a T624. So in both cases, you will get the same performance or better.

    I would like to ask you about naming. Why you use the name MP6 (cores) ? who they differ from cores e.g. adreno 330?

    The Mali-T6xx family of GPU's had a naming convention of the last number denotes the maximum number of cores the Silicon Partner can configure the GPU to have.

    The T604 can have between 1 to 4 cores. The T622 can have between 1 to 2 cores. The T624 can have between 1 to 4 cores. The T628 can have between 1 to 8 cores.

    The MPx suffix is the actual number of cores in that piece of silicon.

    A silicon partner may license the T628, and create several versions from that single license. They may create an MP2 version for their low end SoC's, and an MP8 for their high end SoC's.

    The naming scheme changed with the T7xx and later family of GPUs as we can now scale to greater than 9 cores, and we understood the confusion faced with the older scheme. So now we do not have different maximum core configurations, but just license the GPU as is. That is why the T7xx only has 2 options. The T720 and the T760. Like before, it is the MPx suffix that denotes the actual number of cores in that SoC.

    Regarding your question on comparison with Adreno. That is a more complex matter that has already been answered before. It is about the terminology used. Basically one of our "cores" is not equivalent to one of Adreno's "core".

    For more, feel free to read this: The Mali GPU: An Abstract Machine, Part 3 - The Midgard Shader Core

    And this: Multicore or Multi-pipe GPUs: Easy steps to becoming multi-frag-gasmic

    I hope this helps. Let me know if you have any further questions.

    Kind Regards,

    Michael McGeagh

Reply
  • Hi kubussz,

    "This was a temporary solution for the demand of more cores" and was not intended to improve performance?

    I disagree with that statement. Having 2 clusters does increase performance.

    Graphics, the primary use-case of a GPU, can utilise both clusters without issue and thus having 2 vs 1 gives you increased performance.

    Computer however will only utilise 1 cluster by default, unless you tell it to utilise both 'cl devices' (clusters), in which case you will have increased performance.

    By increased performance, I am comparing with a single cluster version such as a T624. So in both cases, you will get the same performance or better.

    I would like to ask you about naming. Why you use the name MP6 (cores) ? who they differ from cores e.g. adreno 330?

    The Mali-T6xx family of GPU's had a naming convention of the last number denotes the maximum number of cores the Silicon Partner can configure the GPU to have.

    The T604 can have between 1 to 4 cores. The T622 can have between 1 to 2 cores. The T624 can have between 1 to 4 cores. The T628 can have between 1 to 8 cores.

    The MPx suffix is the actual number of cores in that piece of silicon.

    A silicon partner may license the T628, and create several versions from that single license. They may create an MP2 version for their low end SoC's, and an MP8 for their high end SoC's.

    The naming scheme changed with the T7xx and later family of GPUs as we can now scale to greater than 9 cores, and we understood the confusion faced with the older scheme. So now we do not have different maximum core configurations, but just license the GPU as is. That is why the T7xx only has 2 options. The T720 and the T760. Like before, it is the MPx suffix that denotes the actual number of cores in that SoC.

    Regarding your question on comparison with Adreno. That is a more complex matter that has already been answered before. It is about the terminology used. Basically one of our "cores" is not equivalent to one of Adreno's "core".

    For more, feel free to read this: The Mali GPU: An Abstract Machine, Part 3 - The Midgard Shader Core

    And this: Multicore or Multi-pipe GPUs: Easy steps to becoming multi-frag-gasmic

    I hope this helps. Let me know if you have any further questions.

    Kind Regards,

    Michael McGeagh

Children