GPU vendors provide different tools to help users analyze the performance, such as Streamline for Mali GPUs, Snapdragon Profiler for Andreno GPU. Perfetto, a suitable performance tool used for Android OS developed by Google helps you collect GPU performance information, do system profiling, and record system tracing.
Perfetto is a next-generation system tracing tool to enable users to collect performance information from Android Debug Bridge (ADB). When your application encounters a bad performance, this tool may help you analyze and debug graphic issues.
System tracing is a mechanism to record all activities in a device over a short period of time. This mechanism produces a trace file to generate a report about the device system. Perfetto UI, a chrome-based advanced application provides you with convenience to check system tracing.
This screenshot shows what Android trace timeline looks like on Perfetto.
Android system tracing has different methods to record traces, such as command line tools, UI tools. For more details, see: https://perfetto.dev/docs/quickstart/android-tracing.
The following procedure describes one of these methods. This method uses the on-device System Tracing Application to capture the trace.
Step 1: Connect the device that is used to record the trace to your development machine.
Step 2: Run adb devices on the host, then the device is available like this:
$ adb devices List of devices attached 0123456789ABCDEF device
Step 3: Run these commands in a Terminal window:
$ adb root $ adb remount
Step 4: To capture the trace, complete these steps on your device:
Step 5: Download the tracing report using adb. Extract the system trace from a device using adb. The trace file is saved in /data/local/traces. Run the following command in a Terminal widow:
$ adb shell ls /data/local/traces xxx. perfetto-trace $ adb pull xxx. perfetto-trace
Step 6: Click Open trace file from https://ui.perfetto.dev/ on your development machine. The trace timeline status shows on the screen, like the screenshot from the section of What is Perfetto?
Mali DDK can enable the Android systrace support for collecting and inspecting timing information. To enable the systrace support, you must add ANDROID_SYSTRACE=y ANDROID_SYSTRACE_API=y in the DDK configuration.
ANDROID_SYSTRACE=y ANDROID_SYSTRACE_API=y
This example shows how to add the option in the configuration:
$ cd vendor/arm/gpu/product $ setup_android juno-android.config ANDROID_SYSTRACE=y ANDROID_SYSTRACE_API=y
Then, you build the DDK. The generated library libGLES_mali.so can now support the API call tracing.
libGLES_mali.so
This screenshot shows the API functions from the trace file. It is obvious that this trace encounters some certain performance problem. The API glGetIntegerv() takes extremely longer time than other APIs.
API glGetIntegerv()
To find out why glGetIntegerv() requires such a long time, you can enable the systrace label in the underlying function to do a further analysis.
glGetIntegerv()
Add the systrace label to the underlying function:
+ ATRACE_BEGIN(“function name”); function(); + ATRACE_END();
Add more systrace labels to the underlying function as shown in this example:
+ ATRACE_BEGIN("gles2_query_query_disjoint_count"); gles2_query_get_query_objectuiv(ctx, query_state->disjoint_query_id, GL_QUERY_RESULT_EXT, counter); + ATRACE_END(); …… + ATRACE_BEGIN("osu_sem_wait"); osu_sem_wait(&query_object->result_sem); + ATRACE_END;
Then, a newly generated Perfetto trace shows more lines, appearing under glGetIntegerv(). It shows the underlying function osu_sem_wait() consuming more time.
osu_sem_wait()
Perfetto gives you an intuitive view about the execution time consumed by each function.
When you want to optimize a dedicated function, it is important but difficult to catch the trace every time and get it back to Perfetto for analysis. A good suggestion is to statistic the cost time and print the data into logs. You may reduce the useless logs through some conditional control.
Use mali_tpi_get_time_ns() to get the timestamp before and after a dedicated function. mali_tpi_get_time_ns() gets current timestamp in nanoseconds precision. If you want to convert it to millisecond precision, move the value 20 bits to the right, like this:
mali_tpi_get_time_ns()
+ uint64_t start = mali_tpi_get_time_ns(); osu_sem_wait(&query_object->result_sem); + uint64_t stop = mali_tpi_get_time_ns(); + uint64_t reference_time = (stop - start)>>20; // ns -> ms + if(reference_time > 1) // more than 1ms + ALOGD("osu_sem_wait: 0x%" PRIx64, reference_time);
After you apply the patch, the logcat shows the execution time exceeding 1ms.
$ adb logcat|grep osu_sem_wait 05-18 09:47:41.824 3624 3697 D ser:gpu_proces: osu_sem_wait: 0x1 05-18 09:47:42.000 3624 3697 D ser:gpu_proces: osu_sem_wait: 0x2 05-18 09:47:42.096 3624 3697 D ser:gpu_proces: osu_sem_wait: 0x1
It becomes easier to narrow down and find out which function is the bottleneck.
Perfetto can help you resolve graphic issues on your driver. Through the preceding description, you know how to capture a trace by using Perfetto. After you capture the trace, you are able to carry on a further analysis.
This example is about a hang-up issue that occurs in the application. To analyze the root cause of this issue, you may capture the Perfetto trace. After the step 6 from the preceding section, you are able to see the interface like this. In this example, it shows that API-glDrawArrays does not finish.
API-glDrawArrays
With the knowledge of Perfetto, it is easy to understand this type of issue is about the driver. To further analyze the issue, you may enable the debug log in the DDK and add some log in API-glDrawArrays, capture the trace, and then, reproduce the issue.
Once the issue is reproduced, save the log and the Perfetto trace. To find which thread encounters the issue, check the Perfetto trace before you read the log. Then, you may simply locate the log of the questionable thread to analyze the issue, which is more efficient.
This approach helps you narrow down the problem scope to quickly find out the root cause.
Perfetto helps you collect system-wide traces from Android devices from a variety of data sources. We use it not only to analyze performance, but also to check some hang-up issue. It may be possible that we can make better use of this tool to resolve more graphic issues in future.