I ran Streamline on the SimpleTriangle example from Mali OpenGL ES SDK for Android v1.6.0. Basically, each frame it renders a triangle, that covers half of the frame. It renders to the default framebuffer. What I observe is, that most of the time is spent not vertex/fragment processing. What is actually the GPU/driver doing during this time? Note that I don't mean the time between frames, but the time between vertex and fragment processing.
I have tried this example on two platforms, one with Mali-400 and the other with Mali-450. Both give the same result.
Below is an illustration of the behavior when rendering a single frame. As you can see, the middle part is a significant portion of the processing of the frame.
Below is a trace of the OpenGL API calls for a single frame.
glClearColor(red=0.0, green=0.0, blue=0.0, alpha=1.0)
glClear(mask=GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT)
glUseProgram(program=3)
glVertexAttribPointer(index=0, size=2, type=GL_FLOAT, normalized=GL_FALSE, stride=0, pointer=0x776e3af0)
glEnableVertexAttribArray(index=0)
glDrawArrays(mode=GL_TRIANGLES, first=0, count=3)
eglSwapBuffers(dpy=0x1, surface=0x77474988)
Please note that there is a difference between "Counters" and "Activity" (Hence the difference in chart looks).
Counters are collected at set times, and gives the value of that counter when read, resets itself back to zero, and will continue counting until the next time it is read.
Activity however is different and not done via hardware counters. It is the activity... a rough % of utilisation of the GPU (Vertex and Fragment separately) and its activity.
Streamline is telling you that in your highlighted region, the GPU is active, and doing work.
The hardware counters tell you what specific part(s) inside the vertex and/or fragment core(s) were active between the time it was last checked and the current check.
I hope this helps explain things further.
Kind Regards,
Michael McGeagh
This seams reasonable, one thing though. Data is sampled with 1KHz frequency. Doesn't this mean that the Fragment Processor counters should measure an increase at least once during this 2.8 ms window? I never observe this. Never are the counters increased in the beginning of fragment activity, always at the end, so it can't be a matter of occasional drop in sampling frequency.
As you can see in the image, vertex activity is aways between Vertex Processor counters increases. This is what I expect to see for the Fragment activity as well.
Could you provide us with an export of your capture (option within Streamline) and provide me with this for further investigation?
This could be an issue with how Streamline is presenting the information, or it could be correct behaviour... I cant quite tell from the screenshot.
I have the export command grayed out. Probably, because it is a community edition. I am sending you the apc directory.