Dear ARM forum,
I am using the DS-5 Streamline to analyze my application performance on ARM MALI 400.
I am seeing that, GPU vertext processor activity for 3 milliseconds followed by a in-active period of 13 milliseconds then followed by 34 milliseconds GPU pixel proessor activity.
Questions:
1. I am trying to under stand , why there a so much of in-active period? How can I analyze this period for the performance impact?
2. Streamline has provided many performance measuring events, but there is a very poor documentation on , what is this event capturing and how to make use of it for GPU performance analysis?
3. I want to measure the GPU Vertex processor performance in - How many triangles it is processing in one frame, how much time it consumed for that.
GPU Pixel processor performance in - How many pixels processed in one frame,how much time it consumed for that.
4. Is there a document to discuss on analyzing all the events for performance analysis.
Thanks,
Ravinder Are
I am using double buffering, but you mentioned triple buffering? why need triple buffering?
Multiple buffering - Wikipedia, the free encyclopedia
Summary - in a system with vsync, double buffering locks you to a multiple of the vsync period. If your system can't quite run at 60 FPS then it will snap down to 30 FPS, if it can't hit 30 FPS, then it snaps down to 20FPS, etc ...
Please give information on how to calculate above four performance data using the streamline analyzer.
This might be a good place to start in terms of using DS-5 Streamline for graphics performance analysis:
Mali GPU Application Optimization Guide - Mali Developer Center
See Chapter 7 (Utgard Optimization Workflows) for the parts relevent to Mali-400.
HTH,
Pete
Thanks Peter for your Reply.
I did not get any information in the Mali GPU Application Optimization Guide - Mali Developer Center document, to get the
a) number triangles processed in a second
b) number pixels processed in a second
c) Bandwidth consumed in a second
d) frame rate.
It would be great if you could provide the details on this.
Hi Ravinder,
Based off the counter names and descriptions that DS-5 Streamline gives you:
a) Mali-4xx Software Counters: Geometry Statistics: Triangles
"The total number of triangles passed to GLES per-frame."
b) Mali Fragment Processor: Mali-4xx FP: Fragment rasterized count
"Number of fragment rasterized. Fragments/(Quads*4) gives average actual fragments per quad."
c) For bandwidth you will need 4 counters:
Mali Fragment Processor: Mali-4xx FP: Total bus reads
"Total number of 64-bit words read from the bus."
Mali Fragment Processor: Mali-4xx FP: Total bus writes
"Total number of 64-bit words written to the bus."
Mali Vertex Processor: Mali-4xx VP: Words read, system bus
"Total number of 64 bit words read by the GP2 from the system bus per frame."
Mali Vertex Processor: Mali-4xx VP: Words written, system bus
"Total number of 64 bit words written by the GP2 to the system bus per frame."
Add those 4 together will give you the complete GPU Bandwidth used. Multiply this number by 8 to get the value as Bytes.
d) This is actually non-trivial due to Streamline being a time based profiling tool.
There is a counter that 'may' be enabled in your BSP:
Mali-4xx Filmstrip: 1:10
"captures every 10th frame"
If you can use this, you can visually see how many thumbnails are produced in your capture and multiply by 10. This is obviously only accurate to within 10 frames.
Another method, assuming you are not vertex limited, is to measure the time between Vertex Activity spikes. Each frame is 'likely' to only issue one vertex activity spike. Note in some composition environments like android's triple buffering composition, there will be a second smaller spike per frame.
Another is to use streamline annotations and mark eglSwapBuffers so you can see when they are called on the timeline.
If you have sourcecode access however, you may find it best to just measure within the app itself.
I hope that helps.
Kind Regards,
Michael McGeagh
Hi Michael and Peter,
Did you get chance to look in to the streamline log I shared.
I am seeing in a frame VP has some activity followed by some idle time and followed by PP has some activity.
Here my questions are,
1. why VP and PP activity is not parallel, why one after another?
Am I doing something wrong where the parallelism is not possible?
2. Why there is a idle time in my application ? why cant PP start immediately?
I am not using the Vsync, and I have double buffering in my processing.
3. I am running QT based OpenGLES2.0 Application with a simple Vertex shader and a simple Fragment Shader.
I need your support in analyzing .
As I said before, Streamline isn't going to help answer questions about idle time. It's a performance profiler - you can't profile "nothing running" - all of the counters are zero.
The cause of (1) and (2) are probably the same thing. Serialization means idle time and things not overlapping.
Usual suspects:
Less likely suspects:
HTH, Pete
Thanks Peter Its useful information