Profiling with Instruction Trace on Hikey960

Hikey960 is one very powerful 96 board which enable you work on different kinds of area like machine learning, compute vision, mobile application development, and other fields. Arm DS-5 tookit has already provided great support to Hikey960, for example:

Today, I will show you the newly added, very very powerful feature since DS-5 v5.28 which enable you to view the instruction trace raw data in streamline GUI.

Using DS-5, I have connected to Hikey960 successfully which means I have totally controlled the execution of Hikey960 cores. Right now I am only interested in Cortex-A53 core 0, so I have disabled the other 7 cores in my test after boot up. And I have enabled CPU Instruction Trace generated from Cortex-A53 core 0 ETM.

Example of Enabling CPU Instruction Trace on the Cortex-A53

Then I use DS-5 + DSTREAM-ST to load one bare metal application to Hikey960 and start to run, then stop at the point I am interested in. Once the CPU stopped, DS-5 will collect the ETM instruction trace data and start to decode, show in the Trace view of DS-5.

Trace view within DS-5

Then I use DS-5 built-in command to dump the raw trace data and the command 'trace dump' is the right one. However, you need to know the trace sources by executing the command 'trace info'. Please note, this is very important to add the trace sources, otherwise, streamline will refuse to start to decode or it might be confused as the STM trace data.

STM Trace Data

So the complete command is: trace dump c:\temp\dump_2017_1207 CSETM_Cortex-A53_0

Dumped Trace files including two ETM instruction trace configuration files

The dumped trace files include two ETM instruction trace configuration files and one ETM instruction trace raw data which will be consumed by streamline.

Now it is time to start Streamline and enjoy the exciting journey. Let's go.

Firstly open the import trace dump box and import the generated trace files which allow you to select any of them and remember to choose the option 'Trace Dump' at the bottom right drop down box. The supported forms of instruction trace are PTM 1.0-1.1, ETM 3.0-3.5, and ETM 4.0-4.2.

Selecting an Import Trace dump file

Assigning a file type during the import

And then streamline itself will determine what's kind of raw trace data you are importing and prompt one box showing your are importing "Instruction trace" raw data.

Streamline determining the raw trace data being imported

Click Next, Streamline needs to generates the timing information and ask you to provide the the clock source. In this test, I have enabled the "Cycle Accurate" in DTSL configuration. So I choose the option "Cycle Counter" and enter the value of capture duration "1s".

Streamline generating timing information

Click Next, streamline ask you to provide the application/ELF image running on Hikey960.

Add application/ELF image running on HiKey960

Then streamline allows you to decide which sample mode you would like to use, how often the samples being collected and how often the samples be presented. Here i choose "Every instruction" and "Sample every 1us".

Configuring the import options. Choosing the sample mode

Click Finish. Streamline will generate the configurations needed to decode the raw data. The next steps are analyzing the raw instruction data based on your configuration.

Analyzing the raw instruction data using Streamline

Because I have choose "every 1us" the samples will be presented, here I select "High" as the resolution mode.

Choosing the resolution mode for sample data

After streamline finishes the analysis, it will show you the usual five views of streamline including Timeline, Call Paths, Functions, Code and log view.

Finished view following the analysis in Streamline showing Timeline, Call Paths, Functions, Code and Log view

Here I selected the whole duration of capture, the left charts give the information about branching, instructions, load/store and exceptions. If you check against the above "DS-5 trace view" to see the function percentage which are surprisingly the same, but that's expected. For example, the function "Barman_delay" executes the percentage of 12.41% which is the exact percentage in DS-5 trace view. However, only using DS-5 trace view, we are not able to correlate the function with the source code. Instead, streamline v6.5 does support this.

Streamline v6.5 showing the correlation of function and source code

Next Steps

Here I have shown you all the necessary steps to generate instruction trace in DS-5, using streamline to import/decode the raw instruction trace. I hope this will be helpful to your daily debug work during the development. Please comment on your experiences or any questions you may have.

Please note this feature is only available since streamline v6.5/DS-5 v5.28. Please download the latest version v5.28 from the link below.

DS-5 Development Studio