After another busy summer, our profiling tools team released Arm Mobile Studio 2023.4 in October. We did not publish a blog for the 2023.3 release, so this blog is a double issue looking at what’s new in our performance analysis tools.
Arm Mobile Studio is completely free to use. You can download it today from the Arm Developer website. Just create an Arm account and sign in to access the download.
Alongside feature updates, we aim to continuously improve the usability of our tools. This ensures that more time is spent profiling your application, and less time working out how to navigate the tools. Over the last 2 releases, we simplified the connection and configuration scheme for Android GPU profiling in Streamline. It is now possible to capture and visualise a useful range of performance counters at the click of a button.
When you select Capture Arm GPU profile, as the screenshot shows, Streamline automatically detects the GPU present in the connected device. The tool automatically captures the recommended counters for the GPU and visualizes them using our recommended counter template. You now do not need to manually configure the counter selection to collect the best range of counters for the device. Just tick the box and click Start Capture.
To create your own counter configuration, or change other capture settings, select the Use Advanced mode option, as the screenshot shows. This provides full manual control over counter selection and capture settings, so you can customize as much as necessary.
When any capture completes, the selected counter templates automatically visualize the data. Manual selection is no longer required, so there is one less step involved for useful profiling.
Watch this video to see how easy it is to capture a performance profile with Streamline.
One common workflow is sharing a custom template with your colleagues. In previous releases, you could only import a template in the GUI from the Timeline view. We have now extended the Counter configuration template menu to include an Install and Load... option, making it easy to load a template file before capturing, as the screenshot shows.
We make regular changes to the counter templates. This includes improving the counter naming, and adding or removing counters. These changes ensure that users with the latest tools get the best data. However, this means that templates are not always backwards compatible. If you try to open an old capture and apply a template, you usually get a report with some missing data series.
To keep old captures usable across tool versions, the templates used during capture are now embedded into the Streamline capture data. You select these capture-time templates from the Timeline view template menu, as the screenshot shows. This ensures that the data is always correctly visualized, even when a future Streamline release makes a built-in template incompatible.
We have supported profiling debuggable applications from the Streamline GUI since the first Arm Mobile Studio release. However, developers working on Android system software on pre-release devices had to use command-line workflows. Now, for OEM devices running “eng” or “userdebug” builds of the OS, you can select and profile any application on the system from the Streamline GUI.
See the Android documentation for details about these build variants.
If you use the streamline_me.py script to make Android headless captures, you can now specify the activity to start, along with activity command-line options. To do this, use the following command-line arguments when running the script:
streamline_me.py
--package-activity <name>
--package-arguments <args>
For more details, see Generate a headless capture in the Streamline Target Setup Guide for Android.
We also plan to add GUI support in Arm Mobile Studio 2023.5 later this year.
Mali Offline Compiler is continually updated with the latest compiler versions. This release updates to the r44p0 DDK for all Bifrost architecture or newer products. Also, Mali Offline Compiler now supports compiling for the Immortalis-G720, Mali-G720, and Mali-G620 GPUs. These are the first of the 5th Generation architecture GPUs.
Vulkan shader compatibility for shaders built from HLSL with reflection enabled, which is common for some desktop game builds, is now improved. Use of the SPIR-V SPV_GOOGLE_decorate_string and SPV_GOOGLE_user_type semantic extensions is stripped before compilation. Note that these extensions are not supported on production drivers, therefore, release binaries must not include these extensions.
To improve usability of error messages on compilation failure, we have simplified OpenGL ES and OpenCL errors messages. Some messages show a few lines of source context around each error.
If you have not tried Mali Offline Compiler, watch our 7-minute training video to learn how to use it.
Enhancements across Arm Mobile Studio include:
Streamline now supports the full set of performance counters for the Cortex -X4, Cortex-A720, and Cortex-A520 CPUs.
Arm GPU shader core performance counters that measure absolute values are now always presented as the sum over all shader cores. This is instead of the average over all shader cores. This makes it easier to visualize the absolute application workload size in a device-agnostic way, which is a common developer request.
This change is applied at capture time and only impacts new captures made with the latest versions of Streamline. Old captures opened in the latest tools are not affected by this change.
The Arm GPU counter template now automatically enables the Arm GPU scheduling timeline data source, if supported on the target device. We have also simplified labelling of the boxes in the scheduling timeline view to improve readability.
Profiling with ArmNN machine learning library instrumentation is now supported for debuggable Android applications on devices running “user” builds of the OS.
The Performance Advisor light-weight interceptor now reports the number of compute dispatches and trace rays dispatches per frame as a software counter. This is alongside existing workload counters such as the number of render passes per frame.
To try the Performance Advisor for the first time, use the getting started tutorial.
Energy profiling using Arm Energy Probe, or an NI DAQ probe, is a deprecated feature. It will be removed in a future release.
This concludes another round of Arm Mobile Studio improvements. Look out for future releases where we'll make outlining further improvements. Also, coming soon is a new tool will be added to the Studio to help analyze a problem frame. You will be able to visualize the render graph and frame data flow to highlight inefficiencies, and easily find issues with geometry or object rendering that affect performance.
I would also like to take a moment to thank all the developers who get in touch with improvement requests and bug reports. We review them all and try to schedule as much as we can into the roadmap. If you have comments or questions, you can email us at mobilestudio@arm.com. Please keep the feedback coming!
Download Arm Mobile Studio