This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

GL_EXT_disjoint_timer_query for performance

I have a question about OpenGL ES GL_EXT_disjoint_timer_query extension. I am trying to get performance measurement out of my android app and GL_EXT_disjoint_timer_query does not seems to get me proper numbers. I tried with GL_TIME_ELAPSED_EXT and GL_TIMESTAMP_EXT queries but both gives similar results. Not 0, nor a constant. But tiny fluctuating numbers. I tried with Mali Debugger but could not find any GPU profiling info. I tried to install Streamline DS-5 but it requires a rooted device/compiling gator/eclipse setup which is not possible for me to use at the moment.

What are the units of time returned by GL_EXT_disjoint_timer_query? Extension documentation suggests nanoseconds. To my understanding, glBeginQuery() emits at top of pipeline and glEndQuery() at end of pipleine. So how could GL_TIMESTAMP_EXT even work since OpenGL ES does not indicate at which part of the pipeline it emits the result? Is it possible to have access to some counter?

Any help would be greatly appreciated as this is becoming critical.

Device is a Samsung Galaxy Tab S2 Mali T760

  • What are the units of time returned by GL_EXT_disjoint_timer_query? Extension documentation suggests nanoseconds.

    Yes, it's nanoseconds.

    To my understanding, glBeginQuery() emits at top of pipeline and glEndQuery() at end of pipleine. So how could GL_TIMESTAMP_EXT even work since OpenGL ES does not indicate at which part of the pipeline it emits the result?

    It's far worse than that, given that OpenGL doesn't even actually guarantee that the hardware looks like the paper pipeline in the specification. Tile-based GPUs like Mali don't even implement the pipeline as a single pipeline. It's two separate decoupled pipelines - one for vertex shading and one for fragment shading. See:

    The Mali GPU: An Abstract Machine, Part 1 - Frame Pipelining

    In general what this means is that you can't use timer queries for timing single drawcalls; they don't exist in isolation in any usable form. From a query point of view all drawcalls in the pass will complete when the last tile in the fragment shading completes. Timer queries can be used with some success for timing single renderpasses, but just be aware that the pipelining of render-passes means that there will be non-trivial error bars.

    Is it possible to have access to some counter?

    DS-5 Streamline is the only public tool we have for accessing hardware counters. Just to be clear these are low level event counters in the hardware, such as cache hits and misses, number of texture operations, etc, rather than timing information.

    Cheers,
    Pete

  • Thanks for the response and sorry for the long delay.

    Can you point me to the proper setup DS-5 Streamline? My device is an off the shelf one. I cannot root it nor compile kernel modules or change kernel images. Is it possible to launch the .apk through DS-5 Streamline and not have to port my build process to Eclipse (not trivial tasks)?

  • Out of curiosity, how many frames of latency is there? Do you treat the geometry for a full frame? I assume not since this problem is unbounded in memory. So how big is the intermediate buffer?

  • Can you point me to the proper setup DS-5 Streamline?

    DS-5 Community Edition – DS-5 Development Studio – ARM Developer

    I cannot root it nor compile kernel modules or change kernel images.

    Unfortunately the current versions of DS-5 Streamline require a kernel module to capture the data.

    Out of curiosity, how many frames of latency is there?

    It really depends on the application and operating system. If the application is hitting vsync then the pipeline length from application to screen is normally 3 frames (triple buffering), the pipeline length from geometry processing to fragment processing is 0-1 frames depending how close to 100% load the GPU needs to hit vsync at the current operating frequency.

    Do you treat the geometry for a full frame?

    No it's all per render pass (e.g. per application FBO or per default FBO).

    So how big is the intermediate buffer?

    No fixed size - Mali just uses system memory for all GPU resources.

    HTH,
    Pete