This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Mali Performance Optimization

Note: This was originally posted on 17th November 2010 at http://forums.arm.com

I thought we could use a forum to share ideas on how to optimize graphics rendering on Mali-based devices.  So, here's my first post on identifying bottlenecks in the rendering pipeline.

We usually use frame rate, or frames per second (FPS), to measure graphics rendering performance.  System FPS is the overall rendering performance when all system components (CPU, GPU, memory, display) are hooked up together.  What system FPS fails to reveal, however, is how individual components in the rendering pipeline perform.  Knowing each component's performance and locating bottlenecks is the first necessary step in optimizing graphics rendering on any Mali-based system.

Graphics rendering on Mali is a frame-based pipelined process that involves several processing units.  The process begins with the graphics application running on CPU making API calls to the Mali driver.  The Mali driver then sets up data in system memory required by Mali GPU to render a frame.  The GPU can't start rendering until the CPU has completely set up data for that frame.

Within Mali GPU, the geometry processor (GP) consumes data previously set up by the CPU and passes them on to the pixel processor (PP).  The PP can't start rendering until the GP completely sets up required data in memory for that frame.

You get the picture.  Data dependency between processing blocks means a bottleneck in any of the processing cores throttles the whole system FPS.  In addition, processing cores need to access system memory so high memory latencies can potentially be a bottleneck as well.

Using the Mali performance analysis tool (PAT), in conjunction with instrumented Mali drivers, one can usually spot a CPU-bound use-case easily.  The measured system FPS would be significantly lower than GP or PP FPS (measured by PAT).  See attached image.

Has anyone found any interesting bottlenecks, or tricky ones to spot?

Parents
  • Note: This was originally posted on 19th January 2011 at http://forums.arm.com


    I thought we could use a forum to share ideas on how to optimize graphics rendering on Mali-based devices.  So, here's my first post on identifying bottlenecks in the rendering pipeline.

    We usually use frame rate, or frames per second (FPS), to measure graphics rendering performance.  System FPS is the overall rendering performance when all system components (CPU, GPU, memory, display) are hooked up together.  What system FPS fails to reveal, however, is how individual components in the rendering pipeline perform.  Knowing each component's performance and locating bottlenecks is the first necessary step in optimizing graphics rendering on any Mali-based system.

    Graphics rendering on Mali is a frame-based pipelined process that involves several processing units.  The process begins with the graphics application running on CPU making API calls to the Mali driver.  The Mali driver then sets up data in system memory required by Mali GPU to render a frame.  The GPU can't start rendering until the CPU has completely set up data for that frame.

    Within Mali GPU, the geometry processor (GP) consumes data previously set up by the CPU and passes them on to the pixel processor (PP).  The PP can't start rendering until the GP completely sets up required data in memory for that frame.

    You get the picture.  Data dependency between processing blocks means a bottleneck in any of the processing cores throttles the whole system FPS.  In addition, processing cores need to access system memory so high memory latencies can potentially be a bottleneck as well.

    Using the Mali performance analysis tool (PAT), in conjunction with instrumented Mali drivers, one can usually spot a CPU-bound use-case easily.  The measured system FPS would be significantly lower than GP or PP FPS (measured by PAT).  See attached image.

    Has anyone found any interesting bottlenecks, or tricky ones to spot?





    Is it possible for you now to check CPU load - how much application and drivers are consuming - seperately? I mean it is relevant to find split b/w application and drivers load by this way we may come to know which area should be look for optimization.
Reply
  • Note: This was originally posted on 19th January 2011 at http://forums.arm.com


    I thought we could use a forum to share ideas on how to optimize graphics rendering on Mali-based devices.  So, here's my first post on identifying bottlenecks in the rendering pipeline.

    We usually use frame rate, or frames per second (FPS), to measure graphics rendering performance.  System FPS is the overall rendering performance when all system components (CPU, GPU, memory, display) are hooked up together.  What system FPS fails to reveal, however, is how individual components in the rendering pipeline perform.  Knowing each component's performance and locating bottlenecks is the first necessary step in optimizing graphics rendering on any Mali-based system.

    Graphics rendering on Mali is a frame-based pipelined process that involves several processing units.  The process begins with the graphics application running on CPU making API calls to the Mali driver.  The Mali driver then sets up data in system memory required by Mali GPU to render a frame.  The GPU can't start rendering until the CPU has completely set up data for that frame.

    Within Mali GPU, the geometry processor (GP) consumes data previously set up by the CPU and passes them on to the pixel processor (PP).  The PP can't start rendering until the GP completely sets up required data in memory for that frame.

    You get the picture.  Data dependency between processing blocks means a bottleneck in any of the processing cores throttles the whole system FPS.  In addition, processing cores need to access system memory so high memory latencies can potentially be a bottleneck as well.

    Using the Mali performance analysis tool (PAT), in conjunction with instrumented Mali drivers, one can usually spot a CPU-bound use-case easily.  The measured system FPS would be significantly lower than GP or PP FPS (measured by PAT).  See attached image.

    Has anyone found any interesting bottlenecks, or tricky ones to spot?





    Is it possible for you now to check CPU load - how much application and drivers are consuming - seperately? I mean it is relevant to find split b/w application and drivers load by this way we may come to know which area should be look for optimization.
Children
No data