I ran Streamline on the SimpleTriangle example from Mali OpenGL ES SDK for Android v1.6.0. Basically, each frame it renders a triangle, that covers half of the frame. It renders to the default framebuffer. What I observe is, that most of the time is spent not vertex/fragment processing. What is actually the GPU/driver doing during this time? Note that I don't mean the time between frames, but the time between vertex and fragment processing.
I have tried this example on two platforms, one with Mali-400 and the other with Mali-450. Both give the same result.
Below is an illustration of the behavior when rendering a single frame. As you can see, the middle part is a significant portion of the processing of the frame.
Below is a trace of the OpenGL API calls for a single frame.
glClearColor(red=0.0, green=0.0, blue=0.0, alpha=1.0)
glClear(mask=GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT)
glUseProgram(program=3)
glVertexAttribPointer(index=0, size=2, type=GL_FLOAT, normalized=GL_FALSE, stride=0, pointer=0x776e3af0)
glEnableVertexAttribArray(index=0)
glDrawArrays(mode=GL_TRIANGLES, first=0, count=3)
eglSwapBuffers(dpy=0x1, surface=0x77474988)
Hi Sogartar
You are seeing two different things here
i) In Mali-4x0 the hardware counters values are sampled once at the end of a fragment or vertex job. They are not sampled continuously every 1ms. Therefore you will see a single spike representing the total value of a given hardware counter, for a job, at the end of a period of activity. This simplifies the design without, at least in theory, compromising the information provided on the grounds that you cannot usefully associate a 1ms sample of a counter with any particular vertex or shader, or line of code or whatever because you don't know what order things are happening internally. Contrast this with a CPU which provides a program counter to correlate against.
ii) In the UI the "Activity" information has a higher resolution timestamp than the counter information. At high zoom levels you can then see the slight discrepancy this introduces between where the spike appears (on 1 ms tick) and the end of the activity chart (on a higher resolution)
Finally you need to be aware of one other detail
iii) The counter values are only recorded in 1ms samples but they too will initially have a higher resolution timestamp. At high enough zoom levels a counter value will fall between two 1ms samples. Streamline attempts to interpolate the results so it divides the count proportionately between the two 1ms samples. This can lead to the single count I explained above looking like two counts. You can either zoom out to get them totalled up into a single count, or stretch the cursor out (click on the extreme left or right of the blue lozenge with the timestamp) to cover multiple samples.
And finally, finally
iv) If you have two back to back periods of fragment activity you may end up with a view where it appears there is one period of activity with more than one counter spike. You can be sure there is more than one period of activity i,e, more than one job because of the multiple spikes. Zooming in should reveal the gap.
BTW Mali T6/7/8xx have a different design. One key difference is that counters are sampled every 1ms.
Does that help?
R