I ran Streamline on the SimpleTriangle example from Mali OpenGL ES SDK for Android v1.6.0. Basically, each frame it renders a triangle, that covers half of the frame. It renders to the default framebuffer. What I observe is, that most of the time is spent not vertex/fragment processing. What is actually the GPU/driver doing during this time? Note that I don't mean the time between frames, but the time between vertex and fragment processing.
I have tried this example on two platforms, one with Mali-400 and the other with Mali-450. Both give the same result.
Below is an illustration of the behavior when rendering a single frame. As you can see, the middle part is a significant portion of the processing of the frame.
Below is a trace of the OpenGL API calls for a single frame.
glClearColor(red=0.0, green=0.0, blue=0.0, alpha=1.0)
glClear(mask=GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT)
glUseProgram(program=3)
glVertexAttribPointer(index=0, size=2, type=GL_FLOAT, normalized=GL_FALSE, stride=0, pointer=0x776e3af0)
glEnableVertexAttribArray(index=0)
glDrawArrays(mode=GL_TRIANGLES, first=0, count=3)
eglSwapBuffers(dpy=0x1, surface=0x77474988)
I have the export command grayed out. Probably, because it is a community edition. I am sending you the apc directory.