Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Distinguished Ambassadors
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Graphics, Gaming, and VR forum
I can't interpret gpu profiling result on DS-5 Streamline .
Jump...
Cancel
Locked
Locked
Replies
9 replies
Subscribers
136 subscribers
Views
8977 views
Users
0 members are here
DS-5 Streamline
Mali Drivers
Mali-GPU
Mali-400
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
I can't interpret gpu profiling result on DS-5 Streamline .
HyoJeong Lim
over 11 years ago
Note: This was originally posted on 8th December 2012 at
http://forums.arm.com
Hi!
Now I have profling environment for Mali-400MP GPU with DS-5 Streamline.
But I don't have any document for GPU.
I found some doc. that was..
- using_arm_streamline.pdf
- mali_optimization_guid.pdf
- mali_gpu_developer_tools_overview.pdf... etc..
I couldn't find any specific explanation for counters of Mali GPU. except for counter GPU activity( but I am still confused...)
So. I am asking you for some detailed document about Mali GPU profiling, If you can.
Now I have some profiling result, But I can't do anything..
Please let me make some progress...
Thank you
Daisy.
Parents
Chris Varnsverry
over 11 years ago
Note: This was originally posted on 14th December 2012 at
http://forums.arm.com
Hi Daisy,
There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.
As for the ones you have pointed out, I provide thefollowing explanations:
Geometry Processor:
1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex
shader
unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.
Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.
Fragment Processor:
1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.
Here, some of the most usefulcounters are actually:
"TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.
"OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.
Please let me know if you have any further questions.
Chris
Cancel
Up
0
Down
Cancel
Reply
Chris Varnsverry
over 11 years ago
Note: This was originally posted on 14th December 2012 at
http://forums.arm.com
Hi Daisy,
There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.
As for the ones you have pointed out, I provide thefollowing explanations:
Geometry Processor:
1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex
shader
unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.
Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.
Fragment Processor:
1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.
Here, some of the most usefulcounters are actually:
"TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.
"OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.
Please let me know if you have any further questions.
Chris
Cancel
Up
0
Down
Cancel
Children
No data