Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Automated Performance Advice for Android Games
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • High Fidelity Mobile Gaming
  • optimization
  • performance analysis
  • Arm Mobile Studio
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Automated Performance Advice for Android Games

Peter Harris
Peter Harris
March 9, 2020
9 minute read time.

Mobile gaming is advancing rapidly in terms of sophistication, with increasing numbers of successful high-fidelity 3D games exploiting the performance capabilities of the latest smartphones. As the complexity of content rises, the traditional approach of using manual play testing to verify performance is increasingly risky, especially if this is done late in the development process.

For the studios that are releasing a title across a wide range of devices, which is needed in order to reach the largest possible player base, the cost of manual testing effort increases as more and more performance profiles are added. To make this process easier, we are releasing a new tool to automate on-device performance analysis as part of your continuous integration development process. The tool provides easy-to-read performance reports, and dashboard-friendly JSON export so that you can slice the data however you like.

Why CI?

The main objective of continuous integration in software development is to empower the development team to produce high quality products. Running functional and performance testing automatically as they make changes to the application provides developers with timely feedback on their work. Rapid feedback ensures that issues are fixed quickly – ideally before the change is even committed into the main product code line – which reduces the impact of any bug and the cost of fixing it.

A typical CI workflow

The same principle applies to game development, but with one major difference – in most game studios, coders are outnumbered by the creative teams that produce the user interfaces, characters, and game environments. Artists may not have the same deep technical background as the developers building the core rendering technology, but the art assets they create can strongly impact how well the game performs. To fully empower game teams, it is imperative that any automated tool flow provides feedback that all members of the team can understand.

Introducing Performance Advisor

Performance Advisor, released as part of Arm Mobile Studio 2020.0, is a new tool that can generate easy-to-read performance reports based on the rich technical performance data that can be gathered from platforms with Arm CPUs and Mali GPUs. These reports aim to fulfil the role of both health check:

“Is my game hitting its performance targets?”

… and triage nurse:

“Where can I make improvements?”

… with an overall objective that it should be possible for someone who is not an optimization expert to read the reports and understand them.

The Performance Advisor reports are designed to supplement the more advanced data visualizations provided by our Streamline profiler. Streamline is a fantastic tool for deep-diving problems, but it can cause data overload if all you want is a quick status check. Performance Advisor takes the same hardware performance counter data as Streamline, combined with annotation data generated by a light-weight API interceptor or the application itself. It produces an easy to read overview which can provide some initial guidance about potential problem areas to investigate and advice on optimization approaches.  

You may still need to reach for the full Streamline or Graphics Analyzer tools to explore any identified problems in more detail, but at least Performance Advisor can quickly tell you whether you need to do that, and where to start looking.  

A typical Streamline timeline view

Health check 

The first part of any monitoring workflow is a regular health check. Most of the time during development there won’t be any problems, so you don’t want to spend more than few minutes each day checking that everything is still on track. The first part of a Performance Advisor report is designed to provide an at-a-glance overview of the application. It shows the average frame rate, the average CPU and GPU load, and a bottleneck graph showing why particular parts of the game are slowing down. 

The Performance Advice performance summary.

In the example above we can see that the average performance of the test scene – GFXBench Manhattan – is 23 FPS. The loading screen is CPU bound, as you might expect, and two parts near the start do manage to hit the 30 FPS target for this device. However, most of the scenario is limited by GPU performance so we need to dig deeper. 

Region analysis 

Nearly all games contain multiple distinct parts: loading screens, scene transitions, level selection, open game play, and cut scenes all being commonly encountered elements. The workload for each of these sections is likely to be very different, so Performance Advisor allows developers to define regions within their test scenario and will report separate performance results for each.  

Per-frame cost metrics 

The easiest way to understand application performance during game development is to look at the workload cost per frame. If you have a fixed amount of processing capacity, and a target frame rate in mind, then it is inevitable that there is a maximum amount of work per frame possible before you start dropping frames. Streamline doesn’t provide this view naturally – as a system profiler its views are built around performance over time – but Performance Advisor transforms the data during analysis into the more intuitive per-frame workloads. 

A Performance Advisor per-frame cycles graph

In the example above we can see that there is a clear correlation between the increase in the number of shader cycles and the drops in frame rate. Shader complexity would therefore be a good place to start looking for optimizations. In addition to the charts, the tool will also provide useful links to our developer portal which provides some initial optimization advice. In this case the analysis has shown that the content is suffering from high shader arithmetic complexity, so the linked page will give some ideas that to try can reduce arithmetic load in shaders.  

The available charts include key performance metrics: 

  • Overall GPU cycles per frame, reported by workload type. 
  • GPU shader cycles per frame, reported per pipeline. 
  • GPU bandwidth per frame, reported for read and write access. 

And the major content metrics 

  • Draw calls per frame. 
  • Primitives per frame, reported for total and visible primitives. 
  • Pixels per frame. 
  • Average overdraw rate. 

Frame budgeting  

One useful technique to use with the per-frame charts is to set GPU budgets based on the expected GPU performance of the target device. For example, in the case above the device has a GPU with a top frequency of 940MHz. If we want a minimum frame rate of 20 FPS (it’s a complex benchmark!) then we know that the absolute limit of GPU cost per frame is:  

940M / 20 = 47M cycles 

By comparing the actual performance against our budget, we can clearly see which areas are simply incapable of hitting the target performance due to content complexity.  

A Performance Advisor graph with a budget

A similar process can also be followed for content metrics which are not directly tied to GPU performance, such as allowable overdraw levels, and target visibility rates for primitives.  

What is especially useful about the budgeting workflow is that you can set budgets for devices that you don’t have access to. With some simple maths to adjust for frequencies and shader core count, you can approximate performance on another device. All GPU performance data in the reports is normalized by the GPU shader core count, so you will need to correct the budget based on the shader core ratio between the target device and the test device you are using. For example, consider a setting a budget for a mass-market device, based on a relatively high-end test configuration: 

  • Test device configuration: Mali-G72 MP10 
  • Target device configuration: Mali-G72 MP3, 900MHz, 60 FPS 

For the target device we would expect a budget of 15M cycles per frame (900 / 60). However, our test device has many more shader cores than the target, so 15M cycles on our test device would be far too much work. We can adjust the budget by scaling by 3/10, the shader core ratio, so the target budget becomes 4.5M cycles. 

Note that these budgets are idealistic – you’ll never get 100% utilization in real world usage – but they help to set some boundaries for developers to stick to. 

Identifying slow frames 

When using the Performance Advisor light-weight interceptor to monitor API calls, you can capture screenshots when FPS drops below a threshold, giving some visual feedback about what was happening on screen when performance dropped. 

A Performance Advisor chart showing slow frame thumbnails

This provides valuable context when debugging and can allow some common elements to be spotted if repeated slowdowns occur. In the example above the common visual element in all three screenshots is the helicopters with search lights, so this provides some clue about where to start looking. We know from the GPU cycle charts shown earlier that we are looking for an arithmetic heavy shader, so with those two pieces of information we can use Graphics Analyzer – Mobile Studio’s API debugger – to check out the content.

In this case, we quickly confirmed that the spotlight lighting shader for the helicopters - present in all three slow frames - was indeed doing a computationally heavy check to determine if an object at each pixel coordinate intersected with the spotlight light cone. See our tutorial on slow frame capture to learn more. 

Usage models 

Performance Advisor in Mobile Studio Starter Edition can be used out of the box with Streamline and its light-weight interceptor to provide an initial analysis of application performance. If a problem is identified, the same data capture can be opened in Streamline for a more detailed investigation using all the available CPU and GPU counters that were captured.  

For a more targeted report, the next step in integrating Performance Advisor would be to add annotations to your game, breaking the test scenario down into its component regions. This would enhance the generated report, allowing us to provide per-region breakdowns of the performance data.  

Mobile Studio Professional Edition – due later in 2020 – will enable Mobile Studio tools to be used in automated CI workflows. In this usage model, Streamline is run in headless mode so that it can capture data from devices without a human driving the tool using a GUI, and Performance Advisor reports are built from these headless data captures. In addition, Performance Advisor will support exporting key metrics as JSON reports, so that you can integrate detailed data from on-device testing into your existing performance monitoring systems. 

Give us your feedback 

We would like to take the opportunity to say thank you to all of the developers who participated in our beta program; the feedback – good and bad – has helped to shape the product and make it what it is today.  

There is more to do – we still have many ideas for improving the reports, and for additional metrics we would like to be able to show. It's important that our reports work for developers, giving the right information in a form which is understandable. You can download the Performance Advisor report used in this blog, and if you have any feedback or wish list items – good or bad – please let us know in the comments.  

Download Mobile Studio 2020.0

Anonymous
  • blue_way
    blue_way over 4 years ago

    Using performance_advisor to capture PATrace replayer
    Error retrieving images from the target

    /data/data/com.arm.pa.paretrace/com.arm.pa.paretrace : Unable to retrieve image, please verify the logon username/password credentials, that the file exists or that it is a regular file

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
<
Mobile, Graphics, and Gaming blog
  • Optimizing 3D scenes in Godot on Arm GPUs

    Clay John
    Clay John
    In part 1 of this series, learn how we utilized Arm Performance Studio to identify and resolve major performance issues in Godot’s Vulkan-based mobile renderer.
    • June 11, 2025
  • Bringing realistic clothing simulation to mobile: A new frontier for game developers

    Mina Dimova
    Mina Dimova
    Realistic clothing simulation on mobile—our neural GAT model delivers lifelike cloth motion without heavy physics or ground-truth data.
    • June 6, 2025
  • Join the Upscaling Revolution with Arm Accuracy Super Resolution (Arm ASR)

    Lisa Sheckleford
    Lisa Sheckleford
    With Arm ASR you can easily improve frames per second, enhance visual quality, and prevent thermal throttling for smoother, longer gameplay.
    • March 18, 2025