This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What is the GPU/driver doing if not shading?

I ran Streamline on the SimpleTriangle example from Mali OpenGL ES SDK for Android v1.6.0. Basically, each frame it renders a triangle, that covers half of the frame. It renders to the default framebuffer. What I observe is, that most of the time is spent not vertex/fragment processing. What is actually the GPU/driver doing during this time? Note that I don't mean the time between frames, but the time between vertex and fragment processing.

I have tried this example on two platforms, one with Mali-400 and the other with Mali-450. Both give the same result.

Below is an illustration of the behavior when rendering a single frame. As you can see, the middle part is a significant portion of the processing of the frame.

what-is-the-gpu-doing-part.png

Below is a trace of the OpenGL API calls for a single frame.

glClearColor(red=0.0, green=0.0, blue=0.0, alpha=1.0)

glClear(mask=GL_DEPTH_BUFFER_BIT|GL_COLOR_BUFFER_BIT)

glUseProgram(program=3)

glVertexAttribPointer(index=0, size=2, type=GL_FLOAT, normalized=GL_FALSE, stride=0, pointer=0x776e3af0)

glEnableVertexAttribArray(index=0)

glDrawArrays(mode=GL_TRIANGLES, first=0, count=3)

eglSwapBuffers(dpy=0x1, surface=0x77474988)

Parents
  • Hi sogartar,

    As McGeagh mentioned in that highlighted region the GPU fragment activity is your fragment shader running on each pixel. If I understand correctly you are asking about "Fragments rasterized count" counter? This counter counts the fragments rasterized from triangles. More details available Rasterisation - Wikipedia, the free encyclopedia.

    In that highlighted region for these counters, you see it idle because this needs to happen before any fragment shading can start. If you scroll to the left you should see these counters with some bigger numbers for that highlighted GPU fragment activity.

    HTH,

    Wasim

Reply
  • Hi sogartar,

    As McGeagh mentioned in that highlighted region the GPU fragment activity is your fragment shader running on each pixel. If I understand correctly you are asking about "Fragments rasterized count" counter? This counter counts the fragments rasterized from triangles. More details available Rasterisation - Wikipedia, the free encyclopedia.

    In that highlighted region for these counters, you see it idle because this needs to happen before any fragment shading can start. If you scroll to the left you should see these counters with some bigger numbers for that highlighted GPU fragment activity.

    HTH,

    Wasim

Children
  • Hi Wasim,

    To be honest your replay did not make much sense to me.

    If you are to collect total bus writes/reads of the fragment processors, alongside the number of rasterized fragments, you would always find them matching in time. In the above case, there won't be much reading, because the contents of the buffer are not preserved before drawing, so there is no uploading to tile memory before fragment shading. On the other hand total bus writes would match the size of the buffer in memory. This means, that the whole process of uploading to tile memory, running the fragment shader program and downloading the tile back to main memory happens only at the end of GPU fragment activity. This is when the Mali-4xx FPs are active. Then the highlighted area in the image can't be where each pixel is shaded.

  • Please note that there is a difference between "Counters" and "Activity" (Hence the difference in chart looks).

    Counters are collected at set times, and gives the value of that counter when read, resets itself back to zero, and will continue counting until the next time it is read.

    Activity however is different and not done via hardware counters. It is the activity... a rough % of utilisation of the GPU (Vertex and Fragment separately) and its activity.

    Streamline is telling you that in your highlighted region, the GPU is active, and doing work.

    The hardware counters tell you what specific part(s) inside the vertex and/or fragment core(s) were active between the time it was last checked and the current check.

    I hope this helps explain things further.

    Kind Regards,

    Michael McGeagh

  • This seams reasonable, one thing though. Data is sampled with 1KHz frequency. Doesn't this mean that the Fragment Processor counters should measure an increase at least once during this 2.8 ms window? I never observe this. Never are the counters increased in the beginning of fragment activity, always at the end, so it can't be a matter of occasional drop in sampling frequency.

    As you can see in the image, vertex activity is aways between Vertex Processor counters increases. This is what I expect to see for the Fragment activity as well.

  • Could you provide us with an export of your capture (option within Streamline) and provide me with this for further investigation?

    This could be an issue with how Streamline is presenting the information, or it could be correct behaviour... I cant quite tell from the screenshot.

    Kind Regards,

    Michael McGeagh

  • I have the export command grayed out. Probably, because it is a community edition. I am sending you the apc directory.