Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Pelion IoT Platform
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Graphics and Gaming
Jump...
Cancel
Graphics and Gaming
Graphics and Gaming forum
I can't interpret gpu profiling result on DS-5 Streamline .
Blog
Forum
Videos & Files
Help
Jump...
Cancel
New
Replies
9 replies
Subscribers
133 subscribers
Views
4812 views
Users
0 members are here
DS-5 Streamline
Mali Drivers
Mali-GPU
Mali-400
Related
I can't interpret gpu profiling result on DS-5 Streamline .
Offline
HyoJeong Lim
over 7 years ago
Note: This was originally posted on 8th December 2012 at
http://forums.arm.com
Hi!
Now I have profling environment for Mali-400MP GPU with DS-5 Streamline.
But I don't have any document for GPU.
I found some doc. that was..
- using_arm_streamline.pdf
- mali_optimization_guid.pdf
- mali_gpu_developer_tools_overview.pdf... etc..
I couldn't find any specific explanation for counters of Mali GPU. except for counter GPU activity( but I am still confused...)
So. I am asking you for some detailed document about Mali GPU profiling, If you can.
Now I have some profiling result, But I can't do anything..
Please let me make some progress...
Thank you
Daisy.
Parents
Offline
Chris Varnsverry
over 7 years ago
Note: This was originally posted on 14th December 2012 at
http://forums.arm.com
Hi Daisy,
There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.
As for the ones you have pointed out, I provide thefollowing explanations:
Geometry Processor:
1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex
shader
unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.
Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.
Fragment Processor:
1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.
Here, some of the most usefulcounters are actually:
"TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.
"OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.
Please let me know if you have any further questions.
Chris
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
Chris Varnsverry
over 7 years ago
Note: This was originally posted on 14th December 2012 at
http://forums.arm.com
Hi Daisy,
There is a new version of the Mali GPU ApplicationOptimization Guide currently being created thatwill contain a section on using DS-5 streamline to measure Mali hardwarecounters. It will include a section explaining the various hardware counters,and how to use them to determine bottlenecks in your application.
As for the ones you have pointed out, I provide thefollowing explanations:
Geometry Processor:
1. Active cycles: This is the number of cycles perframe that the vertex processor was active.
2. Active cycles, vertex shader: This is the numberof cycles per frame that the vertex
shader
unit was active. Thisessentially measures the total cycles spent in your vertex shader, and should be roughly (number of vertices * vertexshader cycle count).
3. Active cycles, PLBU geometry processing: This isthe number of cycles per frame that the vertex processor PLBU (Polygon List Builder Unit) was active. This might be high if you are processing too manytriangles, in which case you should consider lowering your triangle count.
Generally counter 2 is the mostuseful counter, as it gives you a metric to measure the total impact of vertexprocessing for a frame. This is directly impacted by the number of vertices youpass, and the complexity of the shader.
Fragment Processor:
1. Active clock cycles: The number of clock cyclesthat were active between the start of rendering andthe interrupt raised at the end of rendering.This can be a useful overall counter for the fragment processor, but it is moreimportant to understand where the cycles are being spent, e.g. waiting for thetexture cache or rasterizing a fragment that has already been rasterized once(overdraw).
2. Stall cycles PolygonListReader: This is not generally useful in measuring performance.
3. Pipeline bubbles cycle count: Number of unusedcycles in the fragment shader while rendering is active. This can occur when usinghigh numbers of very small triangles. Insuch cases, it is worth using a "Level Of Detail" system whereby you passgeometry that is always appropriate for the distance from the camera atwhich the object resides. For example, don'tpass 100,000 polygon meshes when the object only occupies 100 pixels, it is better to use a lower polygon model or consider abillboard impostor.
Here, some of the most usefulcounters are actually:
"TextureCache Hit/Miss Ratio" which can be calculated by dividing "Texture Cache HitCount" by "Texture Cache Miss Count". A good app will have somewhere in theregion of 5-10:1, where a bad app will have lower than 5:1. In thesesituations, you should consider compessed and/or mip-mapped textures.
"OverdrawFactor" which can be calculated by: ([Fragment Rasterized Count] * number offragment processors) / (Horizontal Resultion * Vertical Resolution). Typicallya particularly well written application will sit at 2.5 or below, and aparticularly overdraw heavy application will be over 5.
Please let me know if you have any further questions.
Chris
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Answered
Camera feed not showing on Mali-based chips
0
vulkan
Mali DDK for GPU (Midgard Architecture)
Mali GPU (Bifrost Architecture)
OpenGL ES
vulkan api
Mali DDK for GPU (Utgard Architecture)
6833
views
6
replies
Latest
3 months ago
by
Peter Harris
Not Answered
Is it possible read separated Alpha/RGB pixels for rendering?
0
9471
views
0
replies
Started
4 months ago
by
YoungJun
Answered
Query on FPK Occluder
+1
Mali GPU (Valhall Architecture)
arm streamline
9425
views
1
reply
Latest
4 months ago
by
Peter Harris
Not Answered
GLES enabling in BGFX for MAME
0
9575
views
1
reply
Latest
4 months ago
by
Yash Anand
Answered
Zero Copy Buffers using cl_arm_import_memory extension in OpenCL 1.2 - arm mali midgard GPUs.
+1
Midgard
Mali GPU (Midgard Architecture)
Mali OpenCL SDK
30096
views
8
replies
Latest
4 months ago
by
willhua
<
>
View all questions in Graphics and Gaming forum