The Architecture of the Mali Midgard

I’m an optimization algorithm engineer, and I’m very interested on the hardware architectures of ARM Cortex-A? and Mali Midgard. As far as I know for the ARM Cortex-A?, the memory system of ARM consists of the following:

L1 Cache: L1D/L1P

L2 Cache

I searched Internet for Mail Midgard hardware architecture, and found this:

L1 Cache: Data/Texture Cache

L2 Cache (global memory and constant memory)

I wonder if Midgard has any private memory (Shader Core exclusive) and local memory (sharing in cluster), as ARM has the L1P cache code, Midgard is supposed to have the similar Cache/Memory.

Pls share with me the detailed information about the Midgard hardware architecture, many thanks.

Parents
  • There isn't a huge amount of public information about the hardware architecture available; it's not something that a typically user needs to know as it is mostly hidden by drivers. We have a public technical overview here:

    The Mali GPU: An Abstract Machine, Part 3 - The Midgard Shader Core

    I wonder if Midgard has any private memory (Shader Core exclusive) and local memory (sharing in cluster)

    No - only the generic L1/L2 data caches caches for compute applications (the framebuffer tile memory is local to a shader core for fragment shading graphics workloads).

    HTH,

    Pete

  • Dear Pete

    The figure above shows that one Shader Core has one Load/Store Pipe,  one Texture Pipe, and two Arithmetic Pipes. My question is when multiple  Arithmetic Pipes are operating in parallel (especially 4 Arithmetic Pipes), the performance of data read and write would limit the Arithmetic Pipe's operation capability?

    And, is it possible for each Arithmetic Pipe in a Shader Core to run different applications at the same time?

    The figure above tells that each Arithmetic Pipe has a register file, I wonder the organization and size of the file, please. 

    For example, ARMv7 Neon has 32 D(64bit) registers or 15 Q(128bit) registers, but ARMv8 has a different register organization.

Reply
  • Dear Pete

    The figure above shows that one Shader Core has one Load/Store Pipe,  one Texture Pipe, and two Arithmetic Pipes. My question is when multiple  Arithmetic Pipes are operating in parallel (especially 4 Arithmetic Pipes), the performance of data read and write would limit the Arithmetic Pipe's operation capability?

    And, is it possible for each Arithmetic Pipe in a Shader Core to run different applications at the same time?

    The figure above tells that each Arithmetic Pipe has a register file, I wonder the organization and size of the file, please. 

    For example, ARMv7 Neon has 32 D(64bit) registers or 15 Q(128bit) registers, but ARMv8 has a different register organization.

Children
More questions in this forum