This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

I can't interpret  gpu profiling result on DS-5 Streamline .

Note: This was originally posted on 8th December 2012 at http://forums.arm.com

Hi!
Now I have profling environment for Mali-400MP GPU with DS-5 Streamline.
But I don't have any document for GPU.
I found some doc. that was..

- using_arm_streamline.pdf
- mali_optimization_guid.pdf
- mali_gpu_developer_tools_overview.pdf... etc..

I couldn't find any specific explanation for counters of  Mali GPU. except for counter GPU activity( but I am still confused...)
So. I am asking you for some detailed document about Mali GPU profiling, If you can.
Now I have some profiling result, But I can't do anything..
Please let me make some progress...

Thank you
Daisy.
Parents
  • Note: This was originally posted on 14th December 2012 at http://forums.arm.com

    Hi Daisy,

    There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.

    As for the ones you have pointed out, I provide thefollowing explanations:

    Geometry Processor:

    • 1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
    • 2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex shader unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
    • 3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.

    Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.

    Fragment Processor:

    • 1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
    • 2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
    • 3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.

    Here, some of the most usefulcounters are actually:

    "TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.

    "OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.

    Please let me know if you have any further questions.

    Chris
Reply
  • Note: This was originally posted on 14th December 2012 at http://forums.arm.com

    Hi Daisy,

    There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.

    As for the ones you have pointed out, I provide thefollowing explanations:

    Geometry Processor:

    • 1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
    • 2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex shader unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
    • 3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.

    Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.

    Fragment Processor:

    • 1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
    • 2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
    • 3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.

    Here, some of the most usefulcounters are actually:

    "TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.

    "OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.

    Please let me know if you have any further questions.

    Chris
Children
No data