Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Better Together: Integrating Arm Mobile Studio with Unity
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • mobile
  • Android
  • Unity
  • Streamline Performance Analyzer
  • Arm Mobile Studio
  • Tutorial
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Better Together: Integrating Arm Mobile Studio with Unity

Geraint North
Geraint North
March 20, 2019
18 minute read time.

Japanese Version - ダウンロード

Chinese version - 中文版

Korean version - 한국어판


The developers of mobile games strive to ensure that their content works well across a broad range of devices, from the latest high-end premium smartphones to mass-market or older devices.  As the complexity of mobile game content increases, developers rely on good quality tools that can provide them with the insight they need to keep their frame rates stable and their power consumption down.

In this blog, I'm going to talk about how the new Arm Mobile Studio collection of tools can help with Android performance analysis, and how they can work together with game engines to produce an even more compelling performance analysis capability. Arm Mobile Studio’s Starter Edition is available for free and the source code for the sample project used in this blog is available on github.

Specifically, I'm going to focus on Streamline, the most general-purpose and most detailed performance analysis component of Arm Mobile Studio. I'll describe how it can be integrated with Unity to really show you how different aspects of your game make use of the Arm Cortex-A CPU or Arm Mali GPU resources on a mobile device.

About Streamline

Streamline collects sample-based and event-based performance data from a number of sources on an Android device and displays the aggregated results in several different views, with the Timeline view being the one that we're going to concentrate on in this blog.  The top half of the screen shows collected system performance counters, and the lower half can show a variety of different types of information on the same timeline. Here, it shows the Heat Map, which indicates how computation activity is distributed across the threads of the profiled application:

Streamline, showing various performance counters on a timeline view

What are we going to analyze?

We’re going to analyze some very simple Unity content - a flythrough of a procedurally generated terrain. The camera twists and turns, with new terrain tiles generated on-the-fly as they are needed. Tiles that get too far away from the camera are deleted, so the complexity of the scene remains roughly the same over time, once the scene is filled out. Sometimes the camera moves slowly, so the rate of new terrain generation is very slow, then it speeds up, so the rate of terrain generation needs to increase. The edges of each tile are rendered in a darker colour, so you can see the size of each tile:


Because the generation of a terrain tile is computationally intensive, we use the Unity Job Scheduler, which allows us to dispatch background threads that won't hold up the main Unity thread. This ensures that the user experiences a steady frame rate, rather than a jerky pause whenever new terrain is generated.

The demo is configured to run through four different scenes, which look identical but generate the terrain tiles differently. As the following diagram shows, the terrain is composed of many terrain blocks, each of which is a fixed size. Each block comprises several meshes (one for the green terrain, one for the yellow terrain and one for the water) which have a fixed resolution. The Render Distance controls the number of tiles around the player that will get generated. 

 The properties of the game that vary from scene-to-scene

The four scenes are configured as follows:

Scene Render Distance Terrain Tile Size Terrain Resolution Number of parallel terrain generations
1 3 20x20 32x32 8
2 3 20x20 32x32 1
3 6 10x10 16x16 8
4 6 10x10 16x16 1

I’ll be performing the profiling activity on a Huawei P10 phone, which was released in 2017. It contains a HiSilicon Kirin 960 chip, which comprises four high-performance Arm Cortex-A73 CPU cores, four high-efficiency Arm Cortex-A53 CPU cores, and the Arm Mali-G71 MP8 GPU.

Profiling in Unity

Unity itself contains a profiler, and it works just great on Android devices:

Unity Profiler

Unity's profiler does a great job of showing us when jobs are scheduled, but it doesn't show details of the platform's physical resources (CPUs and GPUs for the purposes of this blog) and how they are being used. We might be hitting our 60 FPS, but are we maxing out all of our CPU cores and burning battery to do so? This is where Streamline comes in. In the rest of this blog, we'll show you what data Streamline captures and presents, and how we can use Streamline's annotation features to pass some high-level context from the Unity game down into Streamline, making the data easier to interpret.

Profiling Unity with Streamline – the capabilities

Before we talk about how annotations can be inserted into your Unity game, let’s see what the end result looks like for our example content when we’ve modified the game to make use of three of Streamline’s annotation features:

  • Markers are the simplest form of annotation - a single point-in-time with a label that will appear at the top of Streamline’s Timeline view.
  • Channels provide a bit more structure, by providing a separate row of information alongside each thread. Annotations can be placed into a channel, and unlike a marker, each annotation spans a range of time.
  • Custom Activity Maps are the most advanced form of annotation and are a mechanism for showing global (cross-thread) activities that may have complex dependencies. Each Custom Activity Map appears as its own view in the lower half of the Streamline UI.

Once we have collected a profile (more on that later), it opens in Streamline and we can start to discover what’s going on.

When examining our content, the first thing that draws our attention are the Markers (in green at the top of the timeline), which here indicate where each frame begins:

Streamline screenshot, showing frame rate markers

We can see that the frame rate isn’t as regular as we’d like, and there is considerable bursty activity across all the CPU cores. The frame rate starts slow and then seems to pick up, with occasional pauses. That’s pretty consistent with what we’d expect from Terrain generation, can we look any deeper?

The Timeline view in Streamline is divided into two parts – the top view shows the metric graphs, and the bottom half can show a variety of different things, including the Heat Map, which shows us how the work was distributed across the system, and allows us to filter the top timeline to show only the work attributed to specific processes or threads. By examining the Heat Map, and selecting first the UnityMain thread and then all of the Worker Thread threads, we can see how the CPU activity was split across the main Unity thread and the threads in the job scheduler:

CPU profile for the UnityMain thread, showing large bursts of activity on the Cortex-A73 CPUs.

CPU profile for all the Worker Thread threads, showing smaller burst of activity across both the Cortex-A73 and Cortex-A53 CPUs.

Let's take a look at the main thread. If you look closely at the left-hand screenshot, you'll see an “A” marker next to the UnityMain thread that we’re examining. This means that Streamline Annotation Channels are present. We’ll zoom into the timeline a bit, and expand the UnityMain thread to see what’s going on:

Streamline screenshot, showing Scene and Terrain Controller annotations

The Scene and TerrainController rows are Streamline channels, generated by annotations placed in the game. The Scene channel shows us which scene is currently executing – we can see that this the 20x20, 32x32 version with a render distance of 3 and 8 threads runnable in parallel.

The TerrainController channel is used to indicate when particularly interesting pieces of code are running on the main Unity thread. The blue blocks mark up the code that runs when a Terrain job completes. The green blocks mark up where new Terrains are scheduled for generation. We can see here that all the main thread activity is essentially due to the work that needs to be done when a job completes and the final mesh needs to be generated and inserted into the scene.

As well as focusing on particular threads, we can also constrain our analysis to particular periods of time. Streamline’s calipers allow us to mark up a particular time region for analysis – here, we have selected the start and end of the intense period of activity associated with Terrain completion (calipers are set at the top of the Timeline view):

 Streamline screenshot, demonstrating use of Callipers

If we flip now to the Call Paths view, we can get a fair idea of where time is being spent during the region of time selected by the calipers. Because we used the IL2CPP scripting backend for Unity, we get a lot more information than if we'd used the default Mono runtime. I'm not going to delve into the detail of what's going on here, but there's clearly a lot going on that warrants a deeper dive:

Streamline screenshot, showing Call Path view

What about the worker threads?

When we filter to show only the Worker Thread threads, there are no surprises here, given that we have asked for a maximum of eight jobs to run in parallel. In this screenshot, we’ve expanded the Cortex-A53 cluster so we can see the utilization of individual cores.

Streamline screenshot, showing activity on worker threads.

We see some green blocks in the TerrainController channel that indicate new Terrains being scheduled, then some intense activity across all cores, and then some blue activity in the TerrainController to process those Terrains in the main thread once they’ve been generated (we don’t see that main thread activity in the graphs because we don’t have the UnityMain thread selected).

It is interesting to compare this to the activity in the second scene, where the terrain tiles are of the same complexity, but we only allow one to be scheduled at a time:

Streamline screenshot, showing worker thread activity in the second scene.

There are a couple of things to note here:

  1. The CPU activity is a lot less intense – most of the cores are idle or close-to-idle for much of the time.
  2. The frame rate isn’t perfect, but it is much smoother. Because we only have one frame completing at any time, we reduce those large bursts of activity on the main thread that were holding us up.

We can also compare the profile with the third scene, which uses smaller tiles:

Streamline screenshot, showing worker thread activity in the third scene.

As you can see, the CPU activity is much less intensive and the blocks of blue completion work in the main thread are much shorter, resulting in a smoother frame rate relative to the first scene (but of course there are more jobs overall, so we have to make sure that Terrain generation still keeps up with the rate at which the camera flies over the terrain).

The fourth scene, with small tiles and only one Terrain generation running at a time shows the smoothest frame rate overall, but we have to be very careful to ensure that the Terrain generation happens at a sufficient rate to keep up with the camera, and you’ll see in the original video that this isn’t always the case when the camera is moving fast over the fourth scene:

Streamline screenshot, showing worker thread activity in the fourth scene.

 Screenshot from game content, showing graphical glitch where terrain wasn't rendered in time.

Finally, we can use a Custom Activity Map to get even more insight into how the worker threads are performing Terrain generation. Each Custom Activity Map appears as an option in the bottom-left menu that up until now we’ve been using to display the Heat Map:

Streamline screenshot, showing where Custom Activity Maps appear

When we select the Terrain Generation view, we’ll see a colored box for each Terrain generation activity, showing when it started and stopped, with a mouseover showing the world coordinates of that Terrain tile, when it was initiated and how long it took to complete. Also in this screenshot, we’re graphing the compute work that took place on the Mali GPU – as we’d expect, there is a steady increase in GPU activity as the terrain gets filled out. This screenshot was taken while focusing on the beginning of the first scene, where we are generating large tiles, up to 8 concurrently. The pauses while the main thread prepares all the new geometry are causing the GPU to be idle for long periods of time:

Streamline screenshot, showing Custom Activity Map.

Moving to the fourth scene, where we are generating smaller tiles serially, we see a much smoother ramp in GPU activity, and we can clearly see that only one Terrain job was running at a time (and each job is shorter, due to the smaller tile size):

Streamline screenshot, showing single-thread Custom Activity Map.

This has been a quick walk-through of some of the additional insight that we can get in Streamline if we use annotations from the game itself to provide us with some more high-level context. We used:

  • Markers to show us when new frame started
  • Channels to show us which scene was running and which activities the main thread was performing.
  • Custom Activity Maps to show us the behavior of asynchronously-scheduled terrain generation jobs.

That’s all very cool, so how does it work?

About Streamline annotations

Let's take a deeper look into how Streamline works. When you analyze an Android application, a separate process (running as the same user as the application) called gator runs on the device, collecting profiling information from various hardware sources (such as Mali GPUs an Arm Cortex-A CPUs) and transmitting the aggregated stream of metrics back to your computer. Streamline annotations are a mechanism by which the application itself can insert its own markers and metrics into that stream.

Arm's gator daemon collecting annotations

How to use Streamline Annotations in Unity

Streamline annotations use a specific protocol, and an open-source C implementation is provided as part of Arm Mobile Studio. In order to make it easy to generate Streamline annotations from within Unity content, you need some C# wrappers around the C implementation. The wrappers used for in this walkthrough, along with the required C implementation are available as a Unity Asset Package. Download it and import it into your project as a custom Asset package. The package adds new methods in an Arm namespace that allow you to easily use Streamline annotations in your own project. API documentation can be found in the README.md file inside the package.

Setting up your Unity project for the best experience

If you want to get the fastest and easiest-to-analyze Android builds out of Unity, there are some specific Android Player settings that you should configure:

Make sure that you are using IL2CPP as the Scripting Backend and set the C++ Compiler Configuration to Debug. This will not only compile your scripts to native code for better performance, but it also means that Streamline can see the debug information to map performance data back to your functions in the Call Path view.

Set the Target Architecture to ARM64 (ARMv7 is the default). Most mobile devices today are 64-bit, and you’ll get higher quality code-generation as a result. 

Adding markers

Markers are the easiest annotation to use. The provided method takes a string and an optional color. For example, to emit the green per-frame markers, the following code was used in one of the GameObjects (if you're not familiar with Unity architecture, the Update() method is called automatically once per frame).

void Update ()
{
  Arm.Annotations.marker("Frame " + Time.frameCount, Color.green);
}

Adding channels

Using channels isn’t much harder. First, you have to create a channel, specifying its name. You can then log annotations into the channel using methods on the Channel object, for example:

channel = new Arm.Annotations.Channel("Scene");

channel.annotate(sceneDescription, color);

Remember that annotations in channels will span a period of time. If you want to end the annotation before starting your next one, you can use the end()method. For example, the part of TerrainController that performs Terrain completion in the main thread is wrapped as follows:

// Begin annotation
channel.annotate("Completing", Color.blue);

Mesh mesh = obj.GetComponent<MeshFilter>().mesh;

mesh.vertices = job.vertices.ToArray();
mesh.uv = job.uv.ToArray();
mesh.uv2 = job.uv2.ToArray();

mesh.SetTriangles(job.grassTriangles.ToArray(), 0);
mesh.SetTriangles(job.sandTriangles.ToArray(), 1);
mesh.SetTriangles(job.waterTriangles.ToArray(), 2);
mesh.RecalculateNormals();

// End annotation
channel.end();

Using Custom Activity Maps

Custom Activity Maps (CAMs) can be thought of as just another layer on top of channels. You must first name the CAM, before creating tracks within it. You can then add annotations to those tracks, much as you would add them to channels.

In the example, the Terrain Generation CAM was created as follows:

terrainCAM = new Arm.Annotations.CustomActivityMap("Terrain Generation");

terrainTracks = new Arm.Annotations.CustomActivityMap.Track[16];

for (int i = 0; i < 16; i++)
{
    terrainTracks[i] = terrainCAM.createTrack("TerrainJob " + i);
}

However, there is one complication for our use case: when a job is running in the Unity Job System, it can’t interact with the rest of your game’s object model much at all (which helps to keep things thread-safe). All we can do in the job is remember the start and stop time, and then when the main thread cleans up the job, that's when we are able to register the job's activity in the CAM.

The C# wrappers provide a function that can be safely called from within jobs that returns the current time in the format that Streamline annotations need.

UInt64 startTime = Arm.Annotations.getTime();

Once we’re back in the main thread, we pick a track to use (we manage them in a pool to ensure that there’s no overlap because that is helpful visually) and register the job onto that track. Here, job.timings is a two-entry array filled out by the job containing the start time and stop time of the job.

track.registerJob(obj.name, Color.grey, job.timings[0], job.timings[1]);

And that’s it! I expect that we’ll refine this Unity Package over time to add more functionality; your feedback is always welcome!

Collecting a basic profile in Streamline

There are a few steps that you need to go through in order to collect your first profile in Streamline, but once you are set up things are quite straightforward.

First, you need to download and install the free Windows, Mac or Linux version of Arm Mobile Studio Starter Edition.

Recall from the earlier description of Streamline's architecture that there are a few thing you need to put in place:

  • You need to have gator running on your mobile device, in a way that provides it with access to your application.
  • You need a way of getting data off your device and into the tools.

Streamline provides you with a few ways of achieving this, but we have found that simplest method that is robust across a range of devices is first to make sure that you know a few key pieces of information:

  • Whether your application is 32 or 64-bit.  This will be 64-bit if you followed the instructions above and built your Unity game with the ARM64 options.
  • The Package Name of your application (as you specified in Unity's Android Player settings). For our example application, this is com.Arm.InfiniteTerrain.
  • How to get the gator binaries: you will find them in the streamline/bin/arm (32-bit) or streamline/bin/arm64 (64-bit) folders of your Arm Mobile Studio installation.

Once you know these pieces of information, the steps to perform analysis are as follows:

  • Ensure that the application is installed on your device.
  • Use the Android adb tool to forward a network port from your mobile device to a local port on the system that you are running Arm Mobile Studio on.
  • Use adb to push gator (either the 32 or 64-bit version depending on your application) to your device and start it running with the same Package Name as your application.
  • Start Streamline and connect it to the local port you used adb to forward traffic to. This is the point where you can choose which performance counters you are interested in collecting.
  • Launch your application on your mobile device. Streamline will start to collect the analysis data as it comes in.

Once gator is running, you can install new versions of your application, start and stop Streamline and perform more analyses without having to restart gator.

To make the process easier, you can download the gatorme script that you can use to configure and run gator and adb for you; all you need to provide is the path to the gator binary that you want to run, the Package Name of your application and which Mali GPU you have in your device (This helps if gator can't figure out which GPU you have by probing the device). It also performs several other steps to ensure that this method works well on the broadest range of mobile devices and ensures that gator is shut down properly once you are finished with your profiling activity. (Yes, we will be folding the gatorme functionality directly into Streamline in the near future!).

The gatorme documentation explains the detail, but as a worked example, here's how the InfiniteTerrain content could be profiled, once the APK is installed on your device.

First, run gatorme from the command line:

 

$ ./gatorme.sh com.Arm.InfiniteTerrain G71 ./mobilestudio-macosx/streamline/bin/arm64/gatord

You can now launch Streamline and get ready to capture. There are a couple of settings that you want to make sure you get right:

Streamline screenshot, showing important settings for a successful capture.

Once you get this set up, repeated deploy/analyse/fix steps are easy - you can leave gatorme running while you shut down your application, Build-and-Run direct from Unity and capture more Streamline information.

EDIT: The section below was updated after the Mobile Studio package for Unity was published on 14/04/2021.

I hope you've found this blog interesting and useful. All of the source code for the Mobile Studio package for Unity can be found on GitHub (BSD 3-Clause license). 

And finally, if you've got any questions about Arm Mobile Studio or Arm in graphics and gaming in general, please join us in the Graphics and Multimedia Forum or read more about our tools on the Arm Mobile Studio developer site below!

Arm Mobile Studio resources

Anonymous
Mobile, Graphics, and Gaming blog
  • Join the Upscaling Revolution with Arm Accuracy Super Resolution (Arm ASR)

    Lisa Sheckleford
    Lisa Sheckleford
    With Arm ASR you can easily improve frames per second, enhance visual quality, and prevent thermal throttling for smoother, longer gameplay.
    • March 18, 2025
  • Generative AI in game development

    Roberto Lopez Mendez
    Roberto Lopez Mendez
    How is Generative AI (GenAI) technology impacting different areas of game development?
    • March 13, 2025
  • Physics simulation with graph neural networks targeting mobile

    Tomas Zilhao Borges
    Tomas Zilhao Borges
    In this blog post, we perform a study of the GNN architecture and the new TF-GNN API and determine whether GNNs are a viable approach for implementing physics simulations.
    • February 26, 2025