I've got the Mali Debugger hooked up and am trying to profile my app, the issue is that MGD doesn't appear to recognise the end of the frame, which means that a capture also fails (or runs indefinitely). This is from a Unity GLES app. You can see from the screenshot that a single frame contains all the GLES calls once the application has started. Is this a configuration issue of any kind?
Thanks in advance.
It is a Unity VR application, so there are two framebuffers (it appears one for each eye). There are thousands of frames by the time I am able to load up the level I want to profile, and then get to the position in the level where performance issues occur. If there were a way to connect and not capture frames, that'd be good, all I can see right now is the ability to pause the entire game, not just the frame capture.
The problem with "not capturing frames" and not starting MGD with the application is that there is a lot of application setup at the start. For example when you load textures and compile shaders. So if MGD is not running from the beginning it has no way to get that information. As mentioned earlier we are currently developing a method which will allow you to start MGD part way through your application.
I am still very curious about why your application is taking over 3GB of memory as that still seems a lot. What is the size of the .mgd save file that gets generated?
I just went through a trace, opting to load up a debug level instead of a real level. All I did was try to debug some particle effects, and MGD.exe is using 2.4GB of memory. The Trace File generated (after Save As) is 32.5MB. You can find that trace here: ElementalistX.mgd - Google Drive .
I'm having trouble pulling useful data out of the dumps as well. I'm trying to figure out why having a simple particle effect kills our frame rate. The draw call counts per-frame seem incorrect and it disagree with what Unity's frame debugger usually gives us. There are many frames with only two draw calls which doesn't seem right, and the render passes are all over the place. I was hoping to be able to see the draw calls that are contributing to rendering the particle effect, and see why they're taking so long, but drilling down to the point of seeing where the particle is being rendered seems like a hopeless endeavour with the amount of calls that are happening and no visual feedback about what the draw calls are actually drawing. The Unity Frame Debugger seems a lot better in this respect, but the problem is it doesn't give us draw timings which is what we really need.
Any suggestions would be helpful.
MGD considers a frame as the sequence of function calls executed between two frame-end markers. When you capture a frame of a VR application, the capture end is delayed after the frame-end marker until all the on-going render passes are finished. The additional function call traced after the frame end are added to the next (incomplete) frame.
All the draw calls traced in a frame according to the previous definition are considered for the draw call count. I do not know what Unity's frame debugger consider for draw calls count, but a different definition of frame could explain the difference in numbers.
In VR applications the time warp compositor thread is periodically drawing the two buffer eyes (Developer Center — Documentation and SDKs | Oculus ) regardless if a new buffer eye content is available. If for whatever reason the new scene is not rendered yet, the compositor will still strive to draw the eye buffers according to the refresh rate and you will see frames where just two draw calls are present in the outline view.
If for draw timing you mean the time spent in the GPU doing the work for draw calls, MGD cannot give this information because Mali is a deferred render. MGD only reports how long is spent in the driver for each function call, and for draw calls on Mali these will return almost instantly since they only queue work up for the GPU to do later. If you interested in seeing the times calls were made you can to that in Trace View by adding the ‘Time Started’ Column to the view.
Identifying draw calls
If you want to see what each draw call is doing, consider using the MGD geometry view that allows you to see a wire frame representation of the mesh that a draw call is using without having to capture the attachments (although this currently only works for draws using GL_TRIANGLES and GL_TRIANGLE_STRIP).
Hope that helps,
Thanks for the detailed response Jonathan,
Sorry about the late reply. I've recently found a bug in the Oculus SDK that has been the cause of my performance issues, so using MGD is not a high priority for me anymore. That said, I'd like to respond and ask one more question for future reference:
The Unity Frame Debugger counts all draw calls made that composite the scene. Since they submit all of the draw commands, they must know which one make up the frame and probably don't rely on a GPU sync point. They wouldn't count a TimeWarp as a draw call.
That's disappointing about the draw timing. What is the best way to check the load on the GPU? I know you guys have a shader compiler that will tell you instruction counts, is this currently the best way to detect how heavy of a load your application will put on the GPU?
jodon wrote:That's disappointing about the draw timing. What is the best way to check the load on the GPU? I know you guys have a shader compiler that will tell you instruction counts, is this currently the best way to detect how heavy of a load your application will put on the GPU?
You can use the shader compiler (its also built into MGD in the shaders tab), this will tell you how expensive (roughly) each draw call is compared to another, altough it can't take into acount things like cache misses.
We also have a system wide profiling tool called Streamline which includes support for GPU profiling.You can use this to see which parts of the GPU and CPU are loaded at what times, and if you use it with the MGD interceptor installed (but without mgddaemon running) you will get draw calls, render passes, and frames (from MGDs point of view) annotated directly into Streamline.There will however, still be some disconnect between your application making calls and the GPU working becasue of the deferred nature I mentioned earlier.
There are some caveats to getting Streamline setup though as it depends on the device/BSP you are using. I'm afraid Streamline is not my area of expertise but if you have any questions about it I'm sure I can find someone to help you.