This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

L2 cache in Mali-T628

Hi

I am working on Odroid board which has Mali GPU - T628. To understand the architecture better , for my research, I am looking for answers to following questions (Any help will be highly appreciated)

Q. What is size of L1 cache (Does this hold true: "Two 16KB L1 data caches per shader core; one for texture access and one for generic memory access.")

Q. What does configurability of L2 means? ( The size of this is variable and can be configured by the silicon integrator, but is typically between 32 and 64 KB per instantiated shader core.) Who configures it as in is it hardcoded or hardwired fixed?

Parents Reply Children
  • Hi shingaridavesh,

    The Mali-T628 MP6 found in the Exynos 5422 has 2 core-groups, one with 4 cores, and a second with 2 cores. Each core group has a separate L2 cache accessible only by the respective cores within the core-group. They should be the same size too.

    The Mali-T628 is capable of scaling anywhere from 1 core to 8 cores. There is a maximum of 4 cores per core-group allowed. So if you are using a T628 MP4 or less for example, there would only be one core-group, and as such only one L2 cache.

    It is worth noting that when you are using OpenCL, you will see the Mali-T628 MP6 appear as two devices, and these are the core-groups. By default only the first core-group, in this instance 4 cores, will be utilised for OpenCL purposes.

    If you wish to target all 6 cores in this example, you will need to use both CL devices in your code and split your workload accordingly.

    Regarding the question of the two L1 caches, what this means is there is one L1 cache for the Load/Store pipeline, and another L1 cache for the Texture pipeline.

    If you utilise both pipelines in your application, then indeed both L1 will be used. However if you only use one, or neither of those pipeline, then either one or no L1 caches will be used, respectively.

    We see a trend of most OpenCL applications making heavy use of the Arithmetic pipeline, and moderate use of the Load/Store pipeline, and very few seem to make use of the Texture pipeline... so in a typical case, only the L1 from the Load/Store pipeline would be utilised.

    I hope that clarifies things.

    Kind Regards,

    Michael McGeagh