Dear ARM forum,
I am using the DS-5 Streamline to analyze my application performance on ARM MALI 400.
I am seeing that, GPU vertext processor activity for 3 milliseconds followed by a in-active period of 13 milliseconds then followed by 34 milliseconds GPU pixel proessor activity.
Questions:
1. I am trying to under stand , why there a so much of in-active period? How can I analyze this period for the performance impact?
2. Streamline has provided many performance measuring events, but there is a very poor documentation on , what is this event capturing and how to make use of it for GPU performance analysis?
3. I want to measure the GPU Vertex processor performance in - How many triangles it is processing in one frame, how much time it consumed for that.
GPU Pixel processor performance in - How many pixels processed in one frame,how much time it consumed for that.
4. Is there a document to discuss on analyzing all the events for performance analysis.
Thanks,
Ravinder Are
Thanks Michael and Peter for your replies.
Let me add more details on my system environment:
Target Chip: A53+Mali400 chip
Target Board OS: Linux OS 64 Bit
Mali Drivers: r5p1-01rel0-64bit
Target Application: OpenGLES2.0, fbdev based Double fuffering
ARM Streamline Performance Analyzer: Version 5.23, Build 20151109_152210
Host OS: Windows 7 64 bit
Streamline Log-file : I have Attached the Log file.
1. I am using double buffering, but you mentioned triple buffering? why need triple buffering? what is the additional advantage with this, I am using simple fbdev, and only graphics content I am showing on the display.
2. my interest is to find out the GPU VP and PP performance
a) number triangles processed in a second
b) number pixels processed in a second
c) Bandwidth consumed in a second
d) frame rate.
Please give information on how to calculate above four performance data using the streamline analyzer.
3. Please provide information on Mali-400 supporting Streamline Events and provide their purpose and provide some details on how I can make use of these Events for all GPU performance analysis.
I am using double buffering, but you mentioned triple buffering? why need triple buffering?
Multiple buffering - Wikipedia, the free encyclopedia
Summary - in a system with vsync, double buffering locks you to a multiple of the vsync period. If your system can't quite run at 60 FPS then it will snap down to 30 FPS, if it can't hit 30 FPS, then it snaps down to 20FPS, etc ...
This might be a good place to start in terms of using DS-5 Streamline for graphics performance analysis:
Mali GPU Application Optimization Guide - Mali Developer Center
See Chapter 7 (Utgard Optimization Workflows) for the parts relevent to Mali-400.
HTH,
Pete
Thanks Peter for your Reply.
I did not get any information in the Mali GPU Application Optimization Guide - Mali Developer Center document, to get the
It would be great if you could provide the details on this.
Hi Ravinder,
Based off the counter names and descriptions that DS-5 Streamline gives you:
a) Mali-4xx Software Counters: Geometry Statistics: Triangles
"The total number of triangles passed to GLES per-frame."
b) Mali Fragment Processor: Mali-4xx FP: Fragment rasterized count
"Number of fragment rasterized. Fragments/(Quads*4) gives average actual fragments per quad."
c) For bandwidth you will need 4 counters:
Mali Fragment Processor: Mali-4xx FP: Total bus reads
"Total number of 64-bit words read from the bus."
Mali Fragment Processor: Mali-4xx FP: Total bus writes
"Total number of 64-bit words written to the bus."
Mali Vertex Processor: Mali-4xx VP: Words read, system bus
"Total number of 64 bit words read by the GP2 from the system bus per frame."
Mali Vertex Processor: Mali-4xx VP: Words written, system bus
"Total number of 64 bit words written by the GP2 to the system bus per frame."
Add those 4 together will give you the complete GPU Bandwidth used. Multiply this number by 8 to get the value as Bytes.
d) This is actually non-trivial due to Streamline being a time based profiling tool.
There is a counter that 'may' be enabled in your BSP:
Mali-4xx Filmstrip: 1:10
"captures every 10th frame"
If you can use this, you can visually see how many thumbnails are produced in your capture and multiply by 10. This is obviously only accurate to within 10 frames.
Another method, assuming you are not vertex limited, is to measure the time between Vertex Activity spikes. Each frame is 'likely' to only issue one vertex activity spike. Note in some composition environments like android's triple buffering composition, there will be a second smaller spike per frame.
Another is to use streamline annotations and mark eglSwapBuffers so you can see when they are called on the timeline.
If you have sourcecode access however, you may find it best to just measure within the app itself.
I hope that helps.
Kind Regards,
Michael McGeagh
Hi Michael and Peter,
Did you get chance to look in to the streamline log I shared.
I am seeing in a frame VP has some activity followed by some idle time and followed by PP has some activity.
Here my questions are,
1. why VP and PP activity is not parallel, why one after another?
Am I doing something wrong where the parallelism is not possible?
2. Why there is a idle time in my application ? why cant PP start immediately?
I am not using the Vsync, and I have double buffering in my processing.
3. I am running QT based OpenGLES2.0 Application with a simple Vertex shader and a simple Fragment Shader.
I need your support in analyzing .
As I said before, Streamline isn't going to help answer questions about idle time. It's a performance profiler - you can't profile "nothing running" - all of the counters are zero.
The cause of (1) and (2) are probably the same thing. Serialization means idle time and things not overlapping.
Usual suspects:
Less likely suspects:
HTH, Pete
Thanks Peter Its useful information