This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Basic Graphics Questions , How GPU works and achieves Parallesim.

Dear Experts,

I have some basic graphics questions here, my interest is basically with ARM MALI 400 MP2.

Please provide your inputs

  1. Open VG/GL/GLES APIs generates any commands for GPU(VP and PP) to process? Are they defined by Khronous?
  2. Is every Open VG/GL/GLES API generates some commands?  Few of them may just configure the GPU registers to set the inputs and GPU state.

   3 .  Does the GPU works like some state machine processing the commands generated by OpenGL/ES APIs?  
   4.   Does Mali 400MP has any Instruction-set like ARM Cortex CPUs? I think few GPGPUs have like NVIDIA, ATI/AMD.

   5.   What happens to Shader programs after they gets compiled? the compiled out put is some instructions for Shader-HW?
          Or some commands OR what ?

   6.How the parallelism is achieved in ARM MALI 400 MP GPU? Does it because we have multiple Pixel processors?

      I am not interested with GPGPU or GPComputing , Interested only Graphics content rendering case.
      I am trying to understand how GPU is performing better than CPU for graphics content creation?
      does GPU do some SIMD instruction execution like DSP?

Thanks,

Ravinder Are

Parents
  • Some of these are easier to explain if I answer multiple points at once. It's fair to say these answers will barely scratch the surface of what's really going on, but you did just ask for the basics.


    1, 2 and 3:

    The OpenGL ES API and other APIs act as interfaces to an underlying state machine behaviour. They allow the actual functionality of the GPU to remain a black box as far as the user is concerned, as the same sequence of commands will have the same outcome regardless of how it is reached on different hardware from different vendors. The API and expected outcome of its commands is specified by Khronos, but individual hardware vendors can implement this API anyway they choose, so long as the resultant behavior matches the spec.

    Many of the commands set what can be thought of as states, although these states may not be present on the hardware itself. The Mali driver functions by maintaining the state in CPU space, then packaging up the relevant parts when a call is made that requires the GPU. For example, glDrawArrays. This is the reason we talk about 'draw call overhead' in terms of optimization.

    That said, as a state machine like behavior is specified in the API, you can think of it in terms of a state machine and at the user level it will behave like one.

    4:

    The Mali GPU does indeed have what you might think of as an instruction set, but the instructions are specialized for loading content into the graphics pipeline, rather than the more general instructions you would expect in a CPU. Unlike CPU instructions however, the instruction set for Mali is not published anywhere, as developers can interface efficiently with the GPU using OpenGL ES and not have to worry about the make or version of the underlying hardware.

    5:

    Shader programs are compiled by the driver into a set of binary instructions, which are then saved in GPU memory. The source is not needed on the GPU itself, as many lines of text in a source file are compiled down to a few bytes of GPU instructions, ideally it should be small enough to fit in a single cache line for efficiency. The part of the driver which performs this compilation is replicated in the Mali Offline Shader Compiler, so you can compile some binary shaders yourself and take a look at the statistics of how many instructions your shader code produces.

    6:

    Graphics is a highly parallel problem, as every triangle is transformed independent of each other and within each triangle every fragment is independent of each other. This allows a GPU to focus on throughput. Things don't necessarily have to be done in order, so long as they are all done by the end.

    Technically you could implement an OpenGL ES API on the CPU and people have, but as you have correctly assumed, they are outclassed by a similarly scaled GPU. This is achieved in several ways:

    The hardware in a GPU is assumed to run most commands in a known pipeline, with little to no branching, so the hardware blocks are specialized for this purpose.

    The work can easily be split onto multiple cores as the threads are independent, meaning a lot of work can be done in parallel as mentioned earlier.

    Mali is a tile based renderer, which cuts down on reading and writing to a large frame buffer on a per fragment basis as the cores have a small local cache called a tile buffer. This tile buffer allows fast processing of pixels on a small area of the screen, and also helps divide the rendering problem into smaller more efficient tasks; if you know you're only rendering a small part of the screen, you can easily cull any triangles which don't overlap it on this pass.

    A lot of this methodology is widely used and very well documented academically, so if you search online you will find far more detail on how tile based rendering works. There are also a number of blogs and videos from ARM about advanced techniques to get the most out of tile based rendering.

    I hope this helps you find what you are looking for.

    -Stacy

Reply
  • Some of these are easier to explain if I answer multiple points at once. It's fair to say these answers will barely scratch the surface of what's really going on, but you did just ask for the basics.


    1, 2 and 3:

    The OpenGL ES API and other APIs act as interfaces to an underlying state machine behaviour. They allow the actual functionality of the GPU to remain a black box as far as the user is concerned, as the same sequence of commands will have the same outcome regardless of how it is reached on different hardware from different vendors. The API and expected outcome of its commands is specified by Khronos, but individual hardware vendors can implement this API anyway they choose, so long as the resultant behavior matches the spec.

    Many of the commands set what can be thought of as states, although these states may not be present on the hardware itself. The Mali driver functions by maintaining the state in CPU space, then packaging up the relevant parts when a call is made that requires the GPU. For example, glDrawArrays. This is the reason we talk about 'draw call overhead' in terms of optimization.

    That said, as a state machine like behavior is specified in the API, you can think of it in terms of a state machine and at the user level it will behave like one.

    4:

    The Mali GPU does indeed have what you might think of as an instruction set, but the instructions are specialized for loading content into the graphics pipeline, rather than the more general instructions you would expect in a CPU. Unlike CPU instructions however, the instruction set for Mali is not published anywhere, as developers can interface efficiently with the GPU using OpenGL ES and not have to worry about the make or version of the underlying hardware.

    5:

    Shader programs are compiled by the driver into a set of binary instructions, which are then saved in GPU memory. The source is not needed on the GPU itself, as many lines of text in a source file are compiled down to a few bytes of GPU instructions, ideally it should be small enough to fit in a single cache line for efficiency. The part of the driver which performs this compilation is replicated in the Mali Offline Shader Compiler, so you can compile some binary shaders yourself and take a look at the statistics of how many instructions your shader code produces.

    6:

    Graphics is a highly parallel problem, as every triangle is transformed independent of each other and within each triangle every fragment is independent of each other. This allows a GPU to focus on throughput. Things don't necessarily have to be done in order, so long as they are all done by the end.

    Technically you could implement an OpenGL ES API on the CPU and people have, but as you have correctly assumed, they are outclassed by a similarly scaled GPU. This is achieved in several ways:

    The hardware in a GPU is assumed to run most commands in a known pipeline, with little to no branching, so the hardware blocks are specialized for this purpose.

    The work can easily be split onto multiple cores as the threads are independent, meaning a lot of work can be done in parallel as mentioned earlier.

    Mali is a tile based renderer, which cuts down on reading and writing to a large frame buffer on a per fragment basis as the cores have a small local cache called a tile buffer. This tile buffer allows fast processing of pixels on a small area of the screen, and also helps divide the rendering problem into smaller more efficient tasks; if you know you're only rendering a small part of the screen, you can easily cull any triangles which don't overlap it on this pass.

    A lot of this methodology is widely used and very well documented academically, so if you search online you will find far more detail on how tile based rendering works. There are also a number of blogs and videos from ARM about advanced techniques to get the most out of tile based rendering.

    I hope this helps you find what you are looking for.

    -Stacy

Children