Mobile gaming is advancing rapidly in terms of sophistication, with increasing numbers of successful high-fidelity 3D games exploiting the performance capabilities of the latest smartphones. As the complexity of content rises, the traditional approach of using manual play testing to verify performance is increasingly risky, especially if this is done late in the development process.
For the studios that are releasing a title across a wide range of devices, which is needed in order to reach the largest possible player base, the cost of manual testing effort increases as more and more performance profiles are added. To make this process easier, we are releasing a new tool to automate on-device performance analysis as part of your continuous integration development process. The tool provides easy-to-read performance reports, and dashboard-friendly JSON export so that you can slice the data however you like.
The main objective of continuous integration in software development is to empower the development team to produce high quality products. Running functional and performance testing automatically as they make changes to the application provides developers with timely feedback on their work. Rapid feedback ensures that issues are fixed quickly – ideally before the change is even committed into the main product code line – which reduces the impact of any bug and the cost of fixing it.
The same principle applies to game development, but with one major difference – in most game studios, coders are outnumbered by the creative teams that produce the user interfaces, characters, and game environments. Artists may not have the same deep technical background as the developers building the core rendering technology, but the art assets they create can strongly impact how well the game performs. To fully empower game teams, it is imperative that any automated tool flow provides feedback that all members of the team can understand.
Performance Advisor, released as part of Arm Mobile Studio 2020.0, is a new tool that can generate easy-to-read performance reports based on the rich technical performance data that can be gathered from platforms with Arm CPUs and Mali GPUs. These reports aim to fulfil the role of both health check:
“Is my game hitting its performance targets?”
“Is my game hitting its performance targets?”
… and triage nurse:
“Where can I make improvements?”
“Where can I make improvements?”
… with an overall objective that it should be possible for someone who is not an optimization expert to read the reports and understand them.
The Performance Advisor reports are designed to supplement the more advanced data visualizations provided by our Streamline profiler. Streamline is a fantastic tool for deep-diving problems, but it can cause data overload if all you want is a quick status check. Performance Advisor takes the same hardware performance counter data as Streamline, combined with annotation data generated by a light-weight API interceptor or the application itself. It produces an easy to read overview which can provide some initial guidance about potential problem areas to investigate and advice on optimization approaches.
You may still need to reach for the full Streamline or Graphics Analyzer tools to explore any identified problems in more detail, but at least Performance Advisor can quickly tell you whether you need to do that, and where to start looking.
The first part of any monitoring workflow is a regular health check. Most of the time during development there won’t be any problems, so you don’t want to spend more than few minutes each day checking that everything is still on track. The first part of a Performance Advisor report is designed to provide an at-a-glance overview of the application. It shows the average frame rate, the average CPU and GPU load, and a bottleneck graph showing why particular parts of the game are slowing down.
In the example above we can see that the average performance of the test scene – GFXBench Manhattan – is 23 FPS. The loading screen is CPU bound, as you might expect, and two parts near the start do manage to hit the 30 FPS target for this device. However, most of the scenario is limited by GPU performance so we need to dig deeper.
Nearly all games contain multiple distinct parts: loading screens, scene transitions, level selection, open game play, and cut scenes all being commonly encountered elements. The workload for each of these sections is likely to be very different, so Performance Advisor allows developers to define regions within their test scenario and will report separate performance results for each.
The easiest way to understand application performance during game development is to look at the workload cost per frame. If you have a fixed amount of processing capacity, and a target frame rate in mind, then it is inevitable that there is a maximum amount of work per frame possible before you start dropping frames. Streamline doesn’t provide this view naturally – as a system profiler its views are built around performance over time – but Performance Advisor transforms the data during analysis into the more intuitive per-frame workloads.
In the example above we can see that there is a clear correlation between the increase in the number of shader cycles and the drops in frame rate. Shader complexity would therefore be a good place to start looking for optimizations. In addition to the charts, the tool will also provide useful links to our developer portal which provides some initial optimization advice. In this case the analysis has shown that the content is suffering from high shader arithmetic complexity, so the linked page will give some ideas that to try can reduce arithmetic load in shaders.
The available charts include key performance metrics:
And the major content metrics
One useful technique to use with the per-frame charts is to set GPU budgets based on the expected GPU performance of the target device. For example, in the case above the device has a GPU with a top frequency of 940MHz. If we want a minimum frame rate of 20 FPS (it’s a complex benchmark!) then we know that the absolute limit of GPU cost per frame is:
940M / 20 = 47M cycles
By comparing the actual performance against our budget, we can clearly see which areas are simply incapable of hitting the target performance due to content complexity.
A similar process can also be followed for content metrics which are not directly tied to GPU performance, such as allowable overdraw levels, and target visibility rates for primitives.
What is especially useful about the budgeting workflow is that you can set budgets for devices that you don’t have access to. With some simple maths to adjust for frequencies and shader core count, you can approximate performance on another device. All GPU performance data in the reports is normalized by the GPU shader core count, so you will need to correct the budget based on the shader core ratio between the target device and the test device you are using. For example, consider a setting a budget for a mass-market device, based on a relatively high-end test configuration:
For the target device we would expect a budget of 15M cycles per frame (900 / 60). However, our test device has many more shader cores than the target, so 15M cycles on our test device would be far too much work. We can adjust the budget by scaling by 3/10, the shader core ratio, so the target budget becomes 4.5M cycles.
Note that these budgets are idealistic – you’ll never get 100% utilization in real world usage – but they help to set some boundaries for developers to stick to.
When using the Performance Advisor light-weight interceptor to monitor API calls, you can capture screenshots when FPS drops below a threshold, giving some visual feedback about what was happening on screen when performance dropped.
This provides valuable context when debugging and can allow some common elements to be spotted if repeated slowdowns occur. In the example above the common visual element in all three screenshots is the helicopters with search lights, so this provides some clue about where to start looking. We know from the GPU cycle charts shown earlier that we are looking for an arithmetic heavy shader, so with those two pieces of information we can use Graphics Analyzer – Mobile Studio’s API debugger – to check out the content.
In this case, we quickly confirmed that the spotlight lighting shader for the helicopters - present in all three slow frames - was indeed doing a computationally heavy check to determine if an object at each pixel coordinate intersected with the spotlight light cone. See our tutorial on slow frame capture to learn more.
Performance Advisor in Mobile Studio Starter Edition can be used out of the box with Streamline and its light-weight interceptor to provide an initial analysis of application performance. If a problem is identified, the same data capture can be opened in Streamline for a more detailed investigation using all the available CPU and GPU counters that were captured.
For a more targeted report, the next step in integrating Performance Advisor would be to add annotations to your game, breaking the test scenario down into its component regions. This would enhance the generated report, allowing us to provide per-region breakdowns of the performance data.
Mobile Studio Professional Edition – due later in 2020 – will enable Mobile Studio tools to be used in automated CI workflows. In this usage model, Streamline is run in headless mode so that it can capture data from devices without a human driving the tool using a GUI, and Performance Advisor reports are built from these headless data captures. In addition, Performance Advisor will support exporting key metrics as JSON reports, so that you can integrate detailed data from on-device testing into your existing performance monitoring systems.
We would like to take the opportunity to say thank you to all of the developers who participated in our beta program; the feedback – good and bad – has helped to shape the product and make it what it is today.
There is more to do – we still have many ideas for improving the reports, and for additional metrics we would like to be able to show. It's important that our reports work for developers, giving the right information in a form which is understandable. You can download the Performance Advisor report used in this blog, and if you have any feedback or wish list items – good or bad – please let us know in the comments.
Download Mobile Studio 2020.0
Using performance_advisor to capture PATrace replayerError retrieving images from the target
/data/data/com.arm.pa.paretrace/com.arm.pa.paretrace : Unable to retrieve image, please verify the logon username/password credentials, that the file exists or that it is a regular file
Hi blue_way, What tool and tool version were you using?
What command were you running when you got this message?
As a workaround, you can untick Downloading images - you don't need them for Peformance Advisor (it doesn't use CPU symbols at all).