I ran Streamline on the SimpleTriangle example from Mali OpenGL ES SDK for Android v1.6.0. Basically, each frame it renders a triangle, that covers half of the frame. It renders to the default framebuffer. What I observe is, that most of the time is spent not vertex/fragment processing. What is actually the GPU/driver doing during this time? Note that I don't mean the time between frames, but the time between vertex and fragment processing.
I have tried this example on two platforms, one with Mali-400 and the other with Mali-450. Both give the same result.
Below is an illustration of the behavior when rendering a single frame. As you can see, the middle part is a significant portion of the processing of the frame.
Below is a trace of the OpenGL API calls for a single frame.
glClearColor(red=0.0, green=0.0, blue=0.0, alpha=1.0)
glClear(mask=GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT)
glUseProgram(program=3)
glVertexAttribPointer(index=0, size=2, type=GL_FLOAT, normalized=GL_FALSE, stride=0, pointer=0x776e3af0)
glEnableVertexAttribArray(index=0)
glDrawArrays(mode=GL_TRIANGLES, first=0, count=3)
eglSwapBuffers(dpy=0x1, surface=0x77474988)
Hi Wasim,
To be honest your replay did not make much sense to me.
If you are to collect total bus writes/reads of the fragment processors, alongside the number of rasterized fragments, you would always find them matching in time. In the above case, there won't be much reading, because the contents of the buffer are not preserved before drawing, so there is no uploading to tile memory before fragment shading. On the other hand total bus writes would match the size of the buffer in memory. This means, that the whole process of uploading to tile memory, running the fragment shader program and downloading the tile back to main memory happens only at the end of GPU fragment activity. This is when the Mali-4xx FPs are active. Then the highlighted area in the image can't be where each pixel is shaded.
Please note that there is a difference between "Counters" and "Activity" (Hence the difference in chart looks).
Counters are collected at set times, and gives the value of that counter when read, resets itself back to zero, and will continue counting until the next time it is read.
Activity however is different and not done via hardware counters. It is the activity... a rough % of utilisation of the GPU (Vertex and Fragment separately) and its activity.
Streamline is telling you that in your highlighted region, the GPU is active, and doing work.
The hardware counters tell you what specific part(s) inside the vertex and/or fragment core(s) were active between the time it was last checked and the current check.
I hope this helps explain things further.
Kind Regards,
Michael McGeagh
This seams reasonable, one thing though. Data is sampled with 1KHz frequency. Doesn't this mean that the Fragment Processor counters should measure an increase at least once during this 2.8 ms window? I never observe this. Never are the counters increased in the beginning of fragment activity, always at the end, so it can't be a matter of occasional drop in sampling frequency.
As you can see in the image, vertex activity is aways between Vertex Processor counters increases. This is what I expect to see for the Fragment activity as well.
Could you provide us with an export of your capture (option within Streamline) and provide me with this for further investigation?
This could be an issue with how Streamline is presenting the information, or it could be correct behaviour... I cant quite tell from the screenshot.
I have the export command grayed out. Probably, because it is a community edition. I am sending you the apc directory.