This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Basic Graphics Questions , How GPU works and achieves Parallesim.

Dear Experts,

I have some basic graphics questions here, my interest is basically with ARM MALI 400 MP2.

Please provide your inputs

  1. Open VG/GL/GLES APIs generates any commands for GPU(VP and PP) to process? Are they defined by Khronous?
  2. Is every Open VG/GL/GLES API generates some commands?  Few of them may just configure the GPU registers to set the inputs and GPU state.

   3 .  Does the GPU works like some state machine processing the commands generated by OpenGL/ES APIs?  
   4.   Does Mali 400MP has any Instruction-set like ARM Cortex CPUs? I think few GPGPUs have like NVIDIA, ATI/AMD.

   5.   What happens to Shader programs after they gets compiled? the compiled out put is some instructions for Shader-HW?
          Or some commands OR what ?

   6.How the parallelism is achieved in ARM MALI 400 MP GPU? Does it because we have multiple Pixel processors?

      I am not interested with GPGPU or GPComputing , Interested only Graphics content rendering case.
      I am trying to understand how GPU is performing better than CPU for graphics content creation?
      does GPU do some SIMD instruction execution like DSP?

Thanks,

Ravinder Are

Parents
  • Some of these are easier to explain if I answer multiple points at once. It's fair to say these answers will barely scratch the surface of what's really going on, but you did just ask for the basics.


    1, 2 and 3:

    The OpenGL ES API and other APIs act as interfaces to an underlying state machine behaviour. They allow the actual functionality of the GPU to remain a black box as far as the user is concerned, as the same sequence of commands will have the same outcome regardless of how it is reached on different hardware from different vendors. The API and expected outcome of its commands is specified by Khronos, but individual hardware vendors can implement this API anyway they choose, so long as the resultant behavior matches the spec.

    Many of the commands set what can be thought of as states, although these states may not be present on the hardware itself. The Mali driver functions by maintaining the state in CPU space, then packaging up the relevant parts when a call is made that requires the GPU. For example, glDrawArrays. This is the reason we talk about 'draw call overhead' in terms of optimization.

    That said, as a state machine like behavior is specified in the API, you can think of it in terms of a state machine and at the user level it will behave like one.

    4:

    The Mali GPU does indeed have what you might think of as an instruction set, but the instructions are specialized for loading content into the graphics pipeline, rather than the more general instructions you would expect in a CPU. Unlike CPU instructions however, the instruction set for Mali is not published anywhere, as developers can interface efficiently with the GPU using OpenGL ES and not have to worry about the make or version of the underlying hardware.

    5:

    Shader programs are compiled by the driver into a set of binary instructions, which are then saved in GPU memory. The source is not needed on the GPU itself, as many lines of text in a source file are compiled down to a few bytes of GPU instructions, ideally it should be small enough to fit in a single cache line for efficiency. The part of the driver which performs this compilation is replicated in the Mali Offline Shader Compiler, so you can compile some binary shaders yourself and take a look at the statistics of how many instructions your shader code produces.

    6:

    Graphics is a highly parallel problem, as every triangle is transformed independent of each other and within each triangle every fragment is independent of each other. This allows a GPU to focus on throughput. Things don't necessarily have to be done in order, so long as they are all done by the end.

    Technically you could implement an OpenGL ES API on the CPU and people have, but as you have correctly assumed, they are outclassed by a similarly scaled GPU. This is achieved in several ways:

    The hardware in a GPU is assumed to run most commands in a known pipeline, with little to no branching, so the hardware blocks are specialized for this purpose.

    The work can easily be split onto multiple cores as the threads are independent, meaning a lot of work can be done in parallel as mentioned earlier.

    Mali is a tile based renderer, which cuts down on reading and writing to a large frame buffer on a per fragment basis as the cores have a small local cache called a tile buffer. This tile buffer allows fast processing of pixels on a small area of the screen, and also helps divide the rendering problem into smaller more efficient tasks; if you know you're only rendering a small part of the screen, you can easily cull any triangles which don't overlap it on this pass.

    A lot of this methodology is widely used and very well documented academically, so if you search online you will find far more detail on how tile based rendering works. There are also a number of blogs and videos from ARM about advanced techniques to get the most out of tile based rendering.

    I hope this helps you find what you are looking for.

    -Stacy

Reply
  • Some of these are easier to explain if I answer multiple points at once. It's fair to say these answers will barely scratch the surface of what's really going on, but you did just ask for the basics.


    1, 2 and 3:

    The OpenGL ES API and other APIs act as interfaces to an underlying state machine behaviour. They allow the actual functionality of the GPU to remain a black box as far as the user is concerned, as the same sequence of commands will have the same outcome regardless of how it is reached on different hardware from different vendors. The API and expected outcome of its commands is specified by Khronos, but individual hardware vendors can implement this API anyway they choose, so long as the resultant behavior matches the spec.

    Many of the commands set what can be thought of as states, although these states may not be present on the hardware itself. The Mali driver functions by maintaining the state in CPU space, then packaging up the relevant parts when a call is made that requires the GPU. For example, glDrawArrays. This is the reason we talk about 'draw call overhead' in terms of optimization.

    That said, as a state machine like behavior is specified in the API, you can think of it in terms of a state machine and at the user level it will behave like one.

    4:

    The Mali GPU does indeed have what you might think of as an instruction set, but the instructions are specialized for loading content into the graphics pipeline, rather than the more general instructions you would expect in a CPU. Unlike CPU instructions however, the instruction set for Mali is not published anywhere, as developers can interface efficiently with the GPU using OpenGL ES and not have to worry about the make or version of the underlying hardware.

    5:

    Shader programs are compiled by the driver into a set of binary instructions, which are then saved in GPU memory. The source is not needed on the GPU itself, as many lines of text in a source file are compiled down to a few bytes of GPU instructions, ideally it should be small enough to fit in a single cache line for efficiency. The part of the driver which performs this compilation is replicated in the Mali Offline Shader Compiler, so you can compile some binary shaders yourself and take a look at the statistics of how many instructions your shader code produces.

    6:

    Graphics is a highly parallel problem, as every triangle is transformed independent of each other and within each triangle every fragment is independent of each other. This allows a GPU to focus on throughput. Things don't necessarily have to be done in order, so long as they are all done by the end.

    Technically you could implement an OpenGL ES API on the CPU and people have, but as you have correctly assumed, they are outclassed by a similarly scaled GPU. This is achieved in several ways:

    The hardware in a GPU is assumed to run most commands in a known pipeline, with little to no branching, so the hardware blocks are specialized for this purpose.

    The work can easily be split onto multiple cores as the threads are independent, meaning a lot of work can be done in parallel as mentioned earlier.

    Mali is a tile based renderer, which cuts down on reading and writing to a large frame buffer on a per fragment basis as the cores have a small local cache called a tile buffer. This tile buffer allows fast processing of pixels on a small area of the screen, and also helps divide the rendering problem into smaller more efficient tasks; if you know you're only rendering a small part of the screen, you can easily cull any triangles which don't overlap it on this pass.

    A lot of this methodology is widely used and very well documented academically, so if you search online you will find far more detail on how tile based rendering works. There are also a number of blogs and videos from ARM about advanced techniques to get the most out of tile based rendering.

    I hope this helps you find what you are looking for.

    -Stacy

Children
  • Hello

    I'm not questioner but It is very good information for me!

    Thank you so much, Stacy Smith!

  • Thank you Stacy, this brief overview was very helpful.

    Nevertheless, I'd like to discuss another topic related to the MALI GPU and OpenGL.

    On the product page of the MALI 400 it is said that this GPU is OpenGL ES 2.0 "conformant". What exactly is meant by the term conformant? What does the GPU actively do to support OpenGL ES 2.0? What differs from non-conformant GPUs? Is the architecture some kind of "tailored" to the needs of OpenGL? OpenGL ES, as stated before, is some kind of abstraction API for platform independent development, right?

    Thank you in advance!

    \ben

  • The "conformance tests" are the official tests from Khronos which you have to pass to be able to say that a GPU supports OpenGL ES, etc. A conformant GPU passes all of the conformance test suites.

    HTH,
    Pete

  • OK, so hardware-wise there is no such thing as any kind of special tailored API support? Or how much do the HW design guys build th GPU towards the needs of OpenGL ES? I don't have any background knowledge on this but I'd like to understand how the OpenGL ES standard itself is considered when such a design is made?

  • Not sure a forum post can give you a good answer here - your question is basically "how do you design a GPU?" which is literally a topic that fills many books . To get you started I would highly recommend this book:

    It's one of the few books which includes any detail on GPU hardware (including a Mali-based case study).

    HTH,
    Pete

  • No no you don't get my point. I don't care about the insights of GPU design. I want to know if the Mali GPU was tailor towards the needs to support OpenGL ES? What can a GPU do to pass the "conformance tests"? I thought any GPU could be interfaced using OpenGL ES independent of its design.

  • What can a GPU do to pass the "conformance tests"?

    Implement the specification correctly.

    I thought any GPU could be interfaced using OpenGL ES independent of its design.

    Most GPUs are designed to meet the specifications they need to support (OpenGL, OpenGL ES, OpenCL, DirectX, etc). There is no point having hardware exposing features the APIs don't support, and a GPU which doesn't do what the specifications require isn't usable.

    You can't really separate the GPU design and the API specification - one is built to meet the needs of the other.

  • Peter Harris schrieb:

    You can't really separate the GPU design and the API specification - one is built to meet the needs of the other.

    Thank you, that is what I was looking for.

    I guess my assumptions where considerable in the beginning of openGL, where the HW was released before the first version of the API.

    Anyway, this topic is now clearer to me!

  • Peter Harris wrote:

    There is no point having hardware exposing features the APIs don't support

    I would just add that when the hardware supports functionality which is not exposed through the core GLES API, we and other vendors do sometimes expose new features via extensions, which are published and available at Khronos OpenGL ES Registry. Some of these extensions are vendor specific, while others are supported by multiple vendors.

    Hth,

    Chris

  • I'd add too that the GPU is also used more and more nowadays as a compute engine as well as for graphics. The requirements for things like OpenCL mean that for instance floating point is now IEEE conformant whereas when they were only used for graphics many GPUs did not bother doing rounding in accordance with the standard. This also provides extra impetus to the drive for memory coherence as in HSA between the GPU and the CPU.