This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

The Architecture of the Mali Midgard

I’m an optimization algorithm engineer, and I’m very interested on the hardware architectures of ARM Cortex-A? and Mali Midgard. As far as I know for the ARM Cortex-A?, the memory system of ARM consists of the following:

L1 Cache: L1D/L1P

L2 Cache

I searched Internet for Mail Midgard hardware architecture, and found this:

L1 Cache: Data/Texture Cache

L2 Cache (global memory and constant memory)

I wonder if Midgard has any private memory (Shader Core exclusive) and local memory (sharing in cluster), as ARM has the L1P cache code, Midgard is supposed to have the similar Cache/Memory.

Pls share with me the detailed information about the Midgard hardware architecture, many thanks.

Parents
  • The figure above shows that one Shader Core has one Load/Store Pipe,  one Texture Pipe, and two Arithmetic Pipes. My question is when multiple  Arithmetic Pipes are operating in parallel (especially 4 Arithmetic Pipes), the performance of data read and write would limit the Arithmetic Pipe's operation capability?

    It is going to depend on your algorithm and the ratio of memory access to arithmetic.

    And, is it possible for each Arithmetic Pipe in a Shader Core to run different applications at the same time?

    If you are running a program where that matters you are using it wrong. GPGPU workloads need lots (ideally a minimum of high tens of thousands) of parallel work items / threads to fill the GPU; so statistically you will be running one program at a time.

    I wonder the organization and size of the register file

    For Midgard GPUs this would be a useful starting point:

    ARM Mali Compute Architecture Fundamentals

    HTH,
    Pete

Reply
  • The figure above shows that one Shader Core has one Load/Store Pipe,  one Texture Pipe, and two Arithmetic Pipes. My question is when multiple  Arithmetic Pipes are operating in parallel (especially 4 Arithmetic Pipes), the performance of data read and write would limit the Arithmetic Pipe's operation capability?

    It is going to depend on your algorithm and the ratio of memory access to arithmetic.

    And, is it possible for each Arithmetic Pipe in a Shader Core to run different applications at the same time?

    If you are running a program where that matters you are using it wrong. GPGPU workloads need lots (ideally a minimum of high tens of thousands) of parallel work items / threads to fill the GPU; so statistically you will be running one program at a time.

    I wonder the organization and size of the register file

    For Midgard GPUs this would be a useful starting point:

    ARM Mali Compute Architecture Fundamentals

    HTH,
    Pete

Children
No data