Hi
I am working on Odroid board which has Mali GPU - T628. To understand the architecture better , for my research, I am looking for answers to following questions (Any help will be highly appreciated)
Q. What is size of L1 cache (Does this hold true: "Two 16KB L1 data caches per shader core; one for texture access and one for generic memory access.")
Q. What does configurability of L2 means? ( The size of this is variable and can be configured by the silicon integrator, but is typically between 32 and 64 KB per instantiated shader core.) Who configures it as in is it hardcoded or hardwired fixed?
Hi shingaridavesh,
As mentioned in the Memory System section of The Mali GPU: An Abstract Machine, Part 3 - The Shader Core, the size of the L1 cache is indeed as described. This holds true for all of our Midgard GPUs to date and is not configurable by the vendor.
As for the L2, what we mean is the Silicon vendor can decide what size to put down for their particular implementation of the GPU. For the Mali-T628, it can be configured to be 32kB-256kB. You mentioned you are using a HardKernel ODROID device based on the Mali-T628. I will assume this is the ODROID-XU3, and as such is the Samsung Exynos 5422. You will need to contact Samsung to find out what the L2 cache size is that they decided upon. Note that this will have been decided upon when designing the silicon itself, and not something one can just modify in software.
If you have any further questions, feel free to ask.
Kind Regards,
Michael McGeagh
Hi mcgeagh
Thanks a lot for prompt reply.
I have 2 more question:
The Mali-T628 MP6 found in the Exynos 5422 has 2 core-groups, one with 4 cores, and a second with 2 cores. Each core group has a separate L2 cache accessible only by the respective cores within the core-group. They should be the same size too.
The Mali-T628 is capable of scaling anywhere from 1 core to 8 cores. There is a maximum of 4 cores per core-group allowed. So if you are using a T628 MP4 or less for example, there would only be one core-group, and as such only one L2 cache.
It is worth noting that when you are using OpenCL, you will see the Mali-T628 MP6 appear as two devices, and these are the core-groups. By default only the first core-group, in this instance 4 cores, will be utilised for OpenCL purposes.
If you wish to target all 6 cores in this example, you will need to use both CL devices in your code and split your workload accordingly.
Regarding the question of the two L1 caches, what this means is there is one L1 cache for the Load/Store pipeline, and another L1 cache for the Texture pipeline.
If you utilise both pipelines in your application, then indeed both L1 will be used. However if you only use one, or neither of those pipeline, then either one or no L1 caches will be used, respectively.
We see a trend of most OpenCL applications making heavy use of the Arithmetic pipeline, and moderate use of the Load/Store pipeline, and very few seem to make use of the Texture pipeline... so in a typical case, only the L1 from the Load/Store pipeline would be utilised.
I hope that clarifies things.
Thanks a lot. It helped a lot.