Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog Use Perfetto to analyze graphic issues
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tell us what you think
Tags
  • Android
  • Mali GPUs
  • Debugging
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Use Perfetto to analyze graphic issues

Oscar Zhang
Oscar Zhang
December 21, 2023
5 minute read time.

GPU vendors provide different tools to help users analyze the performance, such as Streamline for Mali GPUs, Snapdragon Profiler for Andreno GPU. Perfetto, a suitable performance tool  used for Android OS developed by Google helps you collect GPU performance information, do system profiling, and record system tracing.

What is Perfetto?

Perfetto is a next-generation system tracing tool to enable users to collect performance information from Android Debug Bridge (ADB). When your application encounters a bad performance, this tool may help you analyze and debug graphic issues.

System tracing is a mechanism to record all activities in a device over a short period of time. This mechanism produces a trace file to generate a report about the device system. Perfetto UI, a chrome-based advanced application provides you with convenience to check system tracing.

This screenshot shows what Android trace timeline looks like on Perfetto.

The screenshot of example_android_trace

Use Perfetto to capture a trace

Android system tracing has different methods to record traces, such as command line tools, UI tools. For more details, see: https://perfetto.dev/docs/quickstart/android-tracing.

The following procedure describes one of these methods. This method uses the on-device System Tracing Application to capture the trace.

Step 1: Connect the device that is used to record the trace to your development machine.

Step 2: Run adb devices on the host, then the device is available like this:

$ adb devices
List of devices attached
0123456789ABCDEF        device

Step 3: Run these commands in a Terminal window:

$ adb root
$ adb remount

Step 4: To capture the trace, complete these steps on your device:

  1. Enable Developer options which includes steps you must configure before you proceed.
  2. Select Developer Options.
  3. Select System Tracing, in the Debugging
  4. Tap Trace debuggable applications
  5. Tap Record trace, then the system starts to capture a trace (Note: Make sure Trace debuggable applications is enabled before you tap Record trace).
  6. Run the application that you want to trace
  7. Tap Record trace again to stop tracing

Step 5: Download the tracing report using adb. Extract the system trace from a device using adb. The trace file is saved in /data/local/traces. Run the following command in a Terminal widow:

$ adb shell ls /data/local/traces
xxx. perfetto-trace
$ adb pull xxx. perfetto-trace

Step 6: Click Open trace file from https://ui.perfetto.dev/ on your development machine. The trace timeline status shows on the screen, like the screenshot from the section of What is Perfetto?

Extended use of Perfetto to deep down the API utilization

Mali DDK can enable the Android systrace support for collecting and inspecting timing information. To enable the systrace support, you must add ANDROID_SYSTRACE=y ANDROID_SYSTRACE_API=y in the DDK configuration.

This example shows how to add the option in the configuration:

$ cd vendor/arm/gpu/product
$ setup_android juno-android.config ANDROID_SYSTRACE=y ANDROID_SYSTRACE_API=y

Then, you build the DDK. The generated library libGLES_mali.so can now support the API call tracing.

This screenshot shows the API functions from the trace file. It is obvious that this trace encounters some certain performance problem. The API glGetIntegerv() takes extremely longer time than other APIs.

The Perfetto interface after adding systrace to API

To find out why glGetIntegerv() requires such a long time, you can enable the systrace label in the underlying function to do a further analysis.

Add the systrace label to the underlying function:

+ ATRACE_BEGIN(“function name”);
function();
+ ATRACE_END();

Add more systrace labels to the underlying function as shown in this example:

+   ATRACE_BEGIN("gles2_query_query_disjoint_count");
    gles2_query_get_query_objectuiv(ctx, query_state->disjoint_query_id, GL_QUERY_RESULT_EXT, counter);
+   ATRACE_END();
……
+   ATRACE_BEGIN("osu_sem_wait");
    osu_sem_wait(&query_object->result_sem);
+   ATRACE_END;

Then, a newly generated Perfetto trace shows more lines, appearing under glGetIntegerv(). It shows the underlying function osu_sem_wait() consuming more time.

The Perfetto interface after adding systrace labels

Use timestamp to measure cost time

Perfetto gives you an intuitive view about the execution time consumed by each function.

When you want to optimize a dedicated function, it is important but difficult to catch the trace every time and get it back to Perfetto for analysis. A good suggestion is to statistic the cost time and print the data into logs. You may reduce the useless logs through some conditional control.

Use mali_tpi_get_time_ns() to get the timestamp before and after a dedicated function. mali_tpi_get_time_ns() gets current timestamp in nanoseconds precision. If you want to convert it to millisecond precision, move the value 20 bits to the right, like this:

+    uint64_t start = mali_tpi_get_time_ns();
     osu_sem_wait(&query_object->result_sem);
+    uint64_t stop = mali_tpi_get_time_ns();
+    uint64_t reference_time = (stop - start)>>20; // ns -> ms
+    if(reference_time > 1) // more than 1ms
+        ALOGD("osu_sem_wait: 0x%" PRIx64, reference_time);

After you apply the patch, the logcat shows the execution time exceeding 1ms.

$ adb logcat|grep osu_sem_wait
05-18 09:47:41.824  3624  3697 D ser:gpu_proces: osu_sem_wait: 0x1 
05-18 09:47:42.000  3624  3697 D ser:gpu_proces: osu_sem_wait: 0x2 
05-18 09:47:42.096  3624  3697 D ser:gpu_proces: osu_sem_wait: 0x1

It becomes easier to narrow down and find out which function is the bottleneck.

Use Perfetto to analyze an endless loop

Perfetto can help you resolve graphic issues on your driver. Through the preceding description, you know how to capture a trace by using Perfetto. After you capture the trace, you are able to carry on a further analysis.

This example is about a hang-up issue that occurs in the application. To analyze the root cause of this issue, you may capture the Perfetto trace. After the step 6 from the preceding section, you are able to see the interface like this. In this example, it shows that API-glDrawArrays does not finish.

The screenshot of the hang-up example

With the knowledge of Perfetto, it is easy to understand this type of issue is about the driver. To further analyze the issue, you may enable the debug log in the DDK and add some log in API-glDrawArrays, capture the trace, and then, reproduce the issue.

Once the issue is reproduced, save the log and the Perfetto trace. To find which thread encounters the issue, check the Perfetto trace before you read the log. Then, you may simply locate the log of the questionable thread to analyze the issue, which is more efficient. 

This approach helps you narrow down the problem scope to quickly find out the root cause.

Conclusion

Perfetto helps you collect system-wide traces from Android devices from a variety of data sources. We use it not only to analyze performance, but also to check some hang-up issue. It may be possible that we can make better use of this tool to resolve more graphic issues in future.

Anonymous
Mobile, Graphics, and Gaming blog
  • Optimizing 3D scenes in Godot on Arm GPUs

    Clay John
    Clay John
    In part 1 of this series, learn how we utilized Arm Performance Studio to identify and resolve major performance issues in Godot’s Vulkan-based mobile renderer.
    • June 11, 2025
  • Bringing realistic clothing simulation to mobile: A new frontier for game developers

    Mina Dimova
    Mina Dimova
    Realistic clothing simulation on mobile—our neural GAT model delivers lifelike cloth motion without heavy physics or ground-truth data.
    • June 6, 2025
  • Join the Upscaling Revolution with Arm Accuracy Super Resolution (Arm ASR)

    Lisa Sheckleford
    Lisa Sheckleford
    With Arm ASR you can easily improve frames per second, enhance visual quality, and prevent thermal throttling for smoother, longer gameplay.
    • March 18, 2025