Every optimization matters in gaming. Optimizations can lead to better frame rates, higher-quality models, more beautiful pixels, and better battery life, which means longer playing sessions and better looking games. The utilization of the GPU is paramount to pushing the best quality at the fastest framerates. Arm has implemented support for updateable drivers and Android GPU Inspector to supercharge gaming on devices with Mali GPUs.
Currently, Android devices receive GPU drivers VIA over-the-air firmware images. Due to the firmware needing to be extremely stable, full system over-the-air updates may only occur one or two times a year. Between those releases, the opportunity to fix bugs and optimize the driver is reduced to the over-the-air update time slots. If a game developer finds a bug in a driver, they will have to wait until the next over-the-air update on an Android device for the fix to occur. Arm is continuously optimizing the driver for Mali GPUs, but the delivery of those updates can only occur during the over-the-air updates.
A mainstay of the PC gamer experience is receiving new drivers for their GPUs that enable optimizations, new features and more stable experiences. Android updateable drivers enable the same experience for Mali GPU device users. Updates are delivered VIA the Google Play store without having to receive a full over-the-air update to their devices: an easy and familiar installation process.
A bug reported by a game developer can be fixed and then pushed to the updateable driver beta channel for testing before being delivered through the Google Play store. Once the bug is fixed, Arm can promote the update to the stable channel. Gamers can then benefit from the extra stability of the driver. As Arm implement optimizations in the driver, they can be continuously delivered to gamers, improving the gaming experience.
Future Mali GPU drivers contain support for Android GPU Inspector, which was recently announced by Google. This is an open-source cross-vendor tool that provides insight for game developers to understand how their content is running on Mali GPUs. Using the profile information, game developers can optimize game content for increased frame rates on devices that use Mali GPUs. Not only can Arm provide optimizations in the driver VIA updateable drivers, game developers can optimize their content with the profiler support in the driver. This represents a real win-win.
To demonstrate the profiling capability of the Android GPU Inspector, we can use the excellent Khronos Group Vulkan Samples. This project, which demonstrates best practices for the Vulkan API, contains a suite of performance examples. These provide toggles to show the difference that API usage has on frame rates and memory bandwidth. One of the examples shows the difference using two or three swapchain images, commonly referred to as double and triple buffering. At the cost of a deeper pipeline and input latency, we can allow the GPU to start work on the next frame earlier. This is because there is no need to wait for the buffer being displayed to finish completion. This example provides buttons for switching between double and triple buffering and measures the time to draw each frame. With vsync enabled, switching to triple buffering can lead to a jump from 30fps (33.3ms) to 60fps (16.6ms) on an Arm Mali-G76 GPU. Using Android GPU Inspector, we can record a trace of the GPU operation during this example, which clearly shows the utilization of the GPU increasing when triple buffering is turned on.
The previous screenshot is of the example running with double buffering enabled. Notice that the frame time is 33.4ms, which represents 30fps. The toggles at the bottom allow easy switching between double and triple buffering.
Android GPU Inspector Trace
The previous image shows an Android GPU Inspector trace as an example. The greyed out left section of the trace is when double buffering is enabled and the right is when triple buffering is enabled. The greyed section clearly shows that the GPU is stalling, as it is waiting for a buffer to draw into. This is shown by the gaps between the submission of work and the GPU hardware queues. This greatly affects the active GPU cycles.
Once triple buffering is turned on, the submission of work is far more regular as the GPU is not waiting on a buffer to draw into. The utilization of the GPU is much higher, shown by the more densely populated GPU active cycles counter.
With triple buffering enabled, the time to draw a frame drops to 16.7ms, which is a jump to 60fps. This is due to the GPU being able to start work on the next frame in the third buffer straight away.
While the Vulkan Samples can exhibit clear performance improvements in GPU utilization, as shown in Android GPU Inspector, game content is often far more complex. Game development and technology company Crytek has a hardware agnostic PC demo named Neon Noir which provides amazing graphics fidelity for modern hardware. Crytek worked with Arm and Google to port the demo to mobile and utilize Arm Mali GPUs. This is an incredible achievement, as moving such intense graphics load to a mobile GPU is a challenging process. Therefore, profiling is required to determine how to load the GPU effectively. The following traces were captured on beta code that does not represent the final demo result but does show remarkable profile guided optimization improvements.
Android GPU Inspector Trace for Crytek Neon Noir - Before
The previous image is of a trace showing the work required for a frame. The profiling data clearly shows an opportunity to load both GPU hardware queues more. There are dependencies between the vertex and fragment work that can be untangled to allow more work to run in parallel.
A frame is taking 78ms to render on a Mali-G76 resulting in 12fps. While 12fps is low, moving such a heavy graphics load to a mobile GPU is a noteworthy accomplishment. This profiling data is a candid insight into the engineering process of moving game content between platforms.
Android GPU Inspector Trace for Neon Noir - After
After careful analysis of the profiling data, Crytek were able to optimize the GPU work. The previous image of the trace shows superior loading of the GPU hardware queues and greatly reduced frame rendering time. The reduction of 33ms results in an improvement of 43% and jumps the frame rate to 22fps.
What is remarkable about this iteration of the demo was that Crytek was able to improve the usage of the GPU. At the same time, another remarkable feature was adding extra graphical features to the demo, such as volumetric fog and post-processing anti-aliasing.
The traces provided in this blog post can be opened in the latest developer releases of the Android GPU Inspector. They are provided as zstd compressed traces, so make sure to unpack the .perfetto trace file before opening it for visualization. Android GPU Inspector is in the development phase, so the traces and user interface are not representative of the final experience. Arm, Google, and Samsung have been collaborating closely to support the Samsung Galaxy S10, Samsung Note 10 and Samsung S20, with further devices in the pipeline. Moreover, Arm launched the Arm Mali-G78 and Arm Mali-G68 GPUs, which bring further performance and efficiency improvements for higher quality gaming experiences on mobile. Finally, Crytek has also continued to use Android GPU Inspector to optimize their content, so expect further technical detail in the near future.
Learn more about Mali GPUs