Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Tools, Software and IDEs blog Trace capture/decode with CoreSight driver and Perf on Linux
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
  • Linaro
  • Trace
  • Juno
  • Debugger
  • CoreSight
  • Linux
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

Trace capture/decode with CoreSight driver and Perf on Linux

Kaiyou Wang
Kaiyou Wang
August 31, 2017
17 minute read time.

Linaro supports a solution for instruction trace without external debugger involved if the Coresight components are embedded.

This article describes the steps to related building, setup and command.

The test environment  is Juno-busybox: Linux (none) 4.9.0-dirty #9 SMP PREEMPT Tue Mar 28 10:39:46 CST 2017 aarch64 GNU/Linux

Code download and build

1 Build the Coresight driver

If you download the Juno kernel code, the Coresight driver code is in the Linux kernel directory: drivers/hwtracing/coresight/. Alternatively you can download the workspace for Juno platform by following instructions described at https://community.arm.com/dev-platforms/b/documents/posts/using-linaros-deliverables-on-juno

1.1 Add configure items to the linux/linaro/configs/vexpress64.conf file

CONFIG_CORESIGHT=y
CONFIG_CORESIGHT_LINKS_AND_SINKS=y
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=y
CONFIG_CORESIGHT_SINK_TPIU=y
CONFIG_CORESIGHT_SINK_ETBV10=y
CONFIG_CORESIGHT_SOURCE_ETM4X=y
CONFIG_CORESIGHT_QCOM_REPLICATOR=y

1.2 Make sure the Coresight configuration is in arch/arm64/boot/dts/arm/juno-base.dtsi

 /*         * Juno TRMs specify the size for these coresight components as 64K.
         * The actual size is just 4K though 64K is reserved. Access to the
         * unmapped reserved region results in a DECERR response.
         */
        etf@20010000 {
                compatible = "arm,coresight-tmc", "arm,primecell";
                reg = <0 0x20010000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        /* input port */
                        port@0 {
                                reg = <0>;
                                etf_in_port: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&main_funnel_out_port>;
                                };
                        };

                        /* output port */
                        port@1 {
                                reg = <0>;
                                etf_out_port: endpoint {
                                        remote-endpoint = <&replicator_in_port0>;
                                };
                        };
                };
        };

        tpiu@20030000 {
                compatible = "arm,coresight-tpiu", "arm,primecell";
                reg = <0 0x20030000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        tpiu_in_port: endpoint {
                                slave-mode;
                                remote-endpoint = <&replicator_out_port0>;
                        };
                };
        };

        main-funnel@20040000 {
                compatible = "arm,coresight-funnel", "arm,primecell";
                reg = <0 0x20040000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        port@0 {
                                reg = <0>;
                                main_funnel_out_port: endpoint {
                                        remote-endpoint = <&etf_in_port>;
                                };
                        };

                        port@1 {
                                reg = <0>;
                                main_funnel_in_port0: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster0_funnel_out_port>;
                                };
                        };

                        port@2 {
                                reg = <1>;
                                main_funnel_in_port1: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster1_funnel_out_port>;
                                };
                        };

                };
        };

        etr@20070000 {
                compatible = "arm,coresight-tmc", "arm,primecell";
                reg = <0 0x20070000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        etr_in_port: endpoint {
                                slave-mode;
                                remote-endpoint = <&replicator_out_port1>;
                        };
                };
        };

        etm0: etm@22040000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x22040000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster0_etm0_out_port: endpoint {
                                remote-endpoint = <&cluster0_funnel_in_port0>;
                        };
                };
        };

        cluster0-funnel@220c0000 {
                compatible = "arm,coresight-funnel", "arm,primecell";
                reg = <0 0x220c0000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        port@0 {
                                reg = <0>;
                                cluster0_funnel_out_port: endpoint {
                                        remote-endpoint = <&main_funnel_in_port0>;
                                };
                        };

                        port@1 {
                                reg = <0>;
                                cluster0_funnel_in_port0: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster0_etm0_out_port>;
                                };
                        };

                        port@2 {
                                reg = <1>;
                                cluster0_funnel_in_port1: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster0_etm1_out_port>;
                                };
                        };
                };
        };

        etm1: etm@22140000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x22140000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster0_etm1_out_port: endpoint {
                                remote-endpoint = <&cluster0_funnel_in_port1>;
                        };
                };
        };

        etm2: etm@23040000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x23040000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster1_etm0_out_port: endpoint {
                                remote-endpoint = <&cluster1_funnel_in_port0>;
                        };
                };
        };

        cluster1-funnel@230c0000 {
                compatible = "arm,coresight-funnel", "arm,primecell";
                reg = <0 0x230c0000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        port@0 {
                                reg = <0>;
                                cluster1_funnel_out_port: endpoint {
                                        remote-endpoint = <&main_funnel_in_port1>;
                                };
                        };

                        port@1 {
                                reg = <0>;
                                cluster1_funnel_in_port0: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster1_etm0_out_port>;
                                };
                        };

                        port@2 {
                                reg = <1>;
                                cluster1_funnel_in_port1: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster1_etm1_out_port>;
                                };
                        };
                        port@3 {
                                reg = <2>;
                                cluster1_funnel_in_port2: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster1_etm2_out_port>;
                                };
                        };
                        port@4 {
                                reg = <3>;
                                cluster1_funnel_in_port3: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&cluster1_etm3_out_port>;
                                };
                        };
                };
        };

        etm3: etm@23140000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x23140000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster1_etm1_out_port: endpoint {
                                remote-endpoint = <&cluster1_funnel_in_port1>;
                        };
                };
        };

        etm4: etm@23240000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x23240000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster1_etm2_out_port: endpoint {
                                remote-endpoint = <&cluster1_funnel_in_port2>;
                        };
                };
        };

        etm5: etm@23340000 {
                compatible = "arm,coresight-etm4x", "arm,primecell";
                reg = <0 0x23340000 0 0x1000>;

                clocks = <&soc_smc50mhz>;
                clock-names = "apb_pclk";
                power-domains = <&scpi_devpd 0>;
                port {
                        cluster1_etm3_out_port: endpoint {
                                remote-endpoint = <&cluster1_funnel_in_port3>;
                        };
                };
        };

        coresight-replicator {
                /*
                 * Non-configurable replicators don't show up on the
                 * AMBA bus.  As such no need to add "arm,primecell".
                 */
                compatible = "arm,coresight-replicator";

                ports {
                        #address-cells = <1>;
                        #size-cells = <0>;

                        /* replicator output ports */
                        port@0 {
                                reg = <0>;
                                replicator_out_port0: endpoint {
                                        remote-endpoint = <&tpiu_in_port>;
                                };
                        };

                        port@1 {
                                reg = <1>;
                                replicator_out_port1: endpoint {
                                        remote-endpoint = <&etr_in_port>;
                                };
                        };

                        /* replicator input port */
                        port@2 {
                                reg = <0>;
                                replicator_in_port0: endpoint {
                                        slave-mode;
                                        remote-endpoint = <&etf_out_port>;
                                };
                        };
                };
        };

2 Build perf and OpenCSD
2.1 Download OpenCSD code 

git clone -b master https://github.com/Linaro/OpenCSD.git

cd decoder/build/linux/
make LINUX64=1 DEBUG=1
make LINUX64=1 DEBUG=0
The OpenCSD library is in decode/lib/linux64/dbg[rel]/ directory

2.2 Download perf code

git clone -b perf-opencsd-4.8 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.8

cd perf-opencsd-4.8/tools/perf

export CSTRACE_PATH= xxx/OpenCSD/decoder

/*build the perf running on the HOST*/
make

/*build the perf running on TARGET*/
make ARCH=arm64 CROSS_COMPILE=gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-
If you want to build a static perf, specify the option "LDFLAGS=-static" in Makefile.perf

The target perf is generated in the current directory.

3 Build the demo application code

 aarch64-linux-gnu-gcc -static -o test main.c

#include "stdio.h"
#include "string.h"

int array[32][32];

int main()
{
int loop=0;
int i=0, j=0;
//while(1)
{
memset(array,0,sizeof(array));
for(i=0;i<32;i++)
{
for(j=0;j<32;j++)
{
array[i][j] = i+j;
}
}
}
return 0;
}

Disassemble the binary:

314 0000000000400658 <main>: 315 400658: a9be7bfd stp x29, x30, [sp,#-32]!
316 40065c: 910003fd mov x29, sp
317 400660: b9001fbf str wzr, [x29,#28]
318 400664: b90017bf str wzr, [x29,#20]
319 400668: b9001bbf str wzr, [x29,#24]
320 40066c: 90000480 adrp x0, 490000 <tzfile_mtime>
321 400670: 91110000 add x0, x0, #0x440
322 400674: d2820002 mov x2, #0x1000 // #4096
323 400678: 52800001 mov w1, #0x0 // #0
324 40067c: 940052e1 bl 415200 <__memset>
325 400680: b90017bf str wzr, [x29,#20]
326 400684: 14000016 b 4006dc <main+0x84>
327 400688: b9001bbf str wzr, [x29,#24]
328 40068c: 1400000e b 4006c4 <main+0x6c>
329 400690: b94017a1 ldr w1, [x29,#20]
330 400694: b9401ba0 ldr w0, [x29,#24]
331 400698: 0b000022 add w2, w1, w0
332 40069c: 90000480 adrp x0, 490000 <tzfile_mtime>
333 4006a0: 91110000 add x0, x0, #0x440
334 4006a4: b9801ba1 ldrsw x1, [x29,#24]
335 4006a8: b98017a3 ldrsw x3, [x29,#20]
336 4006ac: d37be863 lsl x3, x3, #5
337 4006b0: 8b010061 add x1, x3, x1
338 4006b4: b8217802 str w2, [x0,x1,lsl #2]
339 4006b8: b9401ba0 ldr w0, [x29,#24]
340 4006bc: 11000400 add w0, w0, #0x1
341 4006c0: b9001ba0 str w0, [x29,#24]
342 4006c4: b9401ba0 ldr w0, [x29,#24]
343 4006c8: 71007c1f cmp w0, #0x1f
344 4006cc: 54fffe2d b.le 400690 <main+0x38>
345 4006d0: b94017a0 ldr w0, [x29,#20]
346 4006d4: 11000400 add w0, w0, #0x1
347 4006d8: b90017a0 str w0, [x29,#20]
348 4006dc: b94017a0 ldr w0, [x29,#20]
349 4006e0: 71007c1f cmp w0, #0x1f
350 4006e4: 54fffd2d b.le 400688 <main+0x30>
351 4006e8: 52800000 mov w0, #0x0 // #0
352 4006ec: a8c27bfd ldp x29, x30, [sp],#32
353 4006f0: d65f03c0 ret
354 4006f4: 00000000 .inst 0x00000000 ; undefined

Demo trace on Juno-busybox board

1 upload the perf and application binary to Juno-busybox board
2 Trace the demo application with perf
2.1 Trace the userspace application test from the start address 0x400658 to 0x4006f0

./perf record -e cs_etm/@20010000.etf/u --filter 'start 0x400658@/test, stop 0x4006f0@/test' --per-thread ./test

Example of trace

2.2 Compress the trace data files

tar czf cs_example.tgz .debug/ perf.data

2.3 Push the cs_example.tgz file to HOST

Decode the trace data on HOST

1 Uncompress the cs_example.tgz

   tar xzf cs_example.tgz

   rm -rf ~/.debug
   mv .debug ~/.debug

2 Decode the trace data with perf

perf report --stdio --dump

# To display the perf.data header info, please use --header/--header-only options.
#

0x178 [0x1d8]: event: 70
.
. ... raw event: size 472 bytes
. 0000: 46 00 00 00 00 00 d8 01 03 00 00 00 00 00 00 00 F...............
. 0010: 00 00 00 00 00 00 00 00 06 00 00 00 08 00 00 00 ................
. 0020: 00 00 00 00 00 00 00 00 40 40 40 40 40 40 40 40 ........@@@@@@@@
. 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0040: 10 00 00 00 00 00 00 00 a1 0e 00 28 00 00 00 00 ...........(....
. 0050: 03 f4 00 41 00 00 00 00 88 04 00 00 00 00 00 00 ...A............
. 0060: 00 00 00 00 00 00 00 00 cc 00 00 00 00 00 00 00 ................
. 0070: 40 40 40 40 40 40 40 40 01 00 00 00 00 00 00 00 @@@@@@@@........
. 0080: 00 00 00 00 00 00 00 00 12 00 00 00 00 00 00 00 ................
. 0090: a1 0e 00 28 00 00 00 00 00 f4 00 41 00 00 00 00 ...(.......A....
. 00a0: 88 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 00b0: cc 00 00 00 00 00 00 00 40 40 40 40 40 40 40 40 ........@@@@@@@@
. 00c0: 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 00d0: 14 00 00 00 00 00 00 00 a1 0e 00 28 00 00 00 00 ...........(....
. 00e0: 00 f4 00 41 00 00 00 00 88 04 00 00 00 00 00 00 ...A............
. 00f0: 00 00 00 00 00 00 00 00 cc 00 00 00 00 00 00 00 ................
. 0100: 40 40 40 40 40 40 40 40 03 00 00 00 00 00 00 00 @@@@@@@@........
. 0110: 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 ................
. 0120: a1 0e 00 28 00 00 00 00 03 f4 00 41 00 00 00 00 ...(.......A....
. 0130: 88 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0140: cc 00 00 00 00 00 00 00 40 40 40 40 40 40 40 40 ........@@@@@@@@
. 0150: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0160: 18 00 00 00 00 00 00 00 a1 0e 00 28 00 00 00 00 ...........(....
. 0170: 03 f4 00 41 00 00 00 00 88 04 00 00 00 00 00 00 ...A............
. 0180: 00 00 00 00 00 00 00 00 cc 00 00 00 00 00 00 00 ................
. 0190: 40 40 40 40 40 40 40 40 05 00 00 00 00 00 00 00 @@@@@@@@........
. 01a0: 00 00 00 00 00 00 00 00 1a 00 00 00 00 00 00 00 ................
. 01b0: a1 0e 00 28 00 00 00 00 03 f4 00 41 00 00 00 00 ...(.......A....
. 01c0: 88 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 01d0: cc 00 00 00 00 00 00 00 ........

0x178 [0x1d8]: PERF_RECORD_AUXTRACE_INFO type: 3

0x350 [0x50]: event: 1
.
. ... raw event: size 80 bytes
. 0000: 01 00 00 00 01 00 50 00 ff ff ff ff 00 00 00 00 ......P.........
. 0010: 00 10 08 08 80 ff ff ff ff ef f7 f7 7f 00 00 00 ................
. 0020: 00 10 08 08 80 ff ff ff 5b 6b 65 72 6e 65 6c 2e ........[kernel.
. 0030: 6b 61 6c 6c 73 79 6d 73 5d 5f 73 74 65 78 74 00 kallsyms]_stext.
. 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

0x350 [0x50]: PERF_RECORD_MMAP -1/0: [0xffffff8008081000(0x7ff7f7efff) @ 0xffffff8008081000]: x [kernel.kallsyms]_stext

0x3a0 [0x28]: event: 3
.
. ... raw event: size 40 bytes
. 0000: 03 00 00 00 00 00 28 00 ba 00 00 00 ba 00 00 00 ......(.........
. 0010: 70 65 72 66 00 00 00 00 00 00 00 00 00 00 00 00 perf............
. 0020: 00 00 00 00 00 00 00 00 ........

0x3a0 [0x28]: PERF_RECORD_COMM: perf:186/186

0x3c8 [0x30]: event: 11
.
. ... raw event: size 48 bytes
. 0000: 0b 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ......0.........
. 0010: 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0...............
. 0020: ba 00 00 00 ba 00 00 00 5a 00 00 00 00 00 00 00 ........Z.......

0x3c8 [0x30]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []

0x3f8 [0x28]: event: 3
.
. ... raw event: size 40 bytes
. 0000: 03 00 00 00 00 20 28 00 ba 00 00 00 ba 00 00 00 ..... (.........
. 0010: 74 65 73 74 00 00 00 00 ba 00 00 00 ba 00 00 00 test............
. 0020: 5b 00 00 00 00 00 00 00 [.......

0x3f8 [0x28]: PERF_RECORD_COMM exec: test:186/186

0x420 [0x60]: event: 10
.
. ... raw event: size 96 bytes
. 0000: 0a 00 00 00 02 00 60 00 ba 00 00 00 ba 00 00 00 ......`.........
. 0010: 00 00 40 00 00 00 00 00 00 d0 07 00 00 00 00 00 ..@.............
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
. 0030: 14 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0040: 05 00 00 00 02 18 00 00 2f 74 65 73 74 00 00 00 ......../test...
. 0050: ba 00 00 00 ba 00 00 00 5b 00 00 00 00 00 00 00 ........[.......

0x420 [0x60]: PERF_RECORD_MMAP2 186/186: [0x400000(0x7d000) @ 0 00:01 1044 0]: r-xp /test

0x480 [0x60]: event: 10
.
. ... raw event: size 96 bytes
. 0000: 0a 00 00 00 02 00 60 00 ba 00 00 00 ba 00 00 00 ......`.........
. 0010: 00 20 1f 7c 7f 00 00 00 00 10 00 00 00 00 00 00 . .|............
. 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0040: 00 00 00 00 00 00 00 00 5b 76 64 73 6f 5d 00 00 ........[vdso]..
. 0050: ba 00 00 00 ba 00 00 00 5b 00 00 00 00 00 00 00 ........[.......

0x480 [0x60]: PERF_RECORD_MMAP2 186/186: [0x7f7c1f2000(0x1000) @ 0 00:00 0 0]: ---p [vdso]

0x4e0 [0x30]: event: 11
.
. ... raw event: size 48 bytes
. 0000: 0b 00 00 00 00 00 30 00 30 00 00 00 00 00 00 00 ......0.0.......
. 0010: 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
. 0020: ba 00 00 00 ba 00 00 00 5a 00 00 00 00 00 00 00 ........Z.......

0x4e0 [0x30]: PERF_RECORD_AUX offset: 0x30 size: 0x90 flags: 0 []

0x510 [0x30]: event: 4
.
. ... raw event: size 48 bytes
. 0000: 04 00 00 00 00 00 30 00 ba 00 00 00 ba 00 00 00 ......0.........
. 0010: ba 00 00 00 ba 00 00 00 f4 a9 49 cc 77 06 00 00 ..........I.w...
. 0020: ba 00 00 00 ba 00 00 00 5b 00 00 00 00 00 00 00 ........[.......

0x510 [0x30]: PERF_RECORD_EXIT(186:186):(186:186)

0x540 [0x30]: event: 71
.
. ... raw event: size 48 bytes
. 0000: 47 00 00 00 00 00 30 00 c0 00 00 00 00 00 00 00 G.....0.........
. 0010: 00 00 00 00 00 00 00 00 b4 75 4c 6f a0 a5 b0 23 .........uLo...#
. 0020: 00 00 00 00 ba 00 00 00 ff ff ff ff 00 00 00 00 ................

0x540 [0x30]: PERF_RECORD_AUXTRACE size: 0xc0 offset: 0 ref: 0x23b0a5a06f4c75b4 idx: 0 tid: 186 cpu: -1

. ... CoreSight ETM Trace data: size 192 bytes
0: I_ASYNC : Alignment Synchronisation.
12: I_TRACE_INFO : Trace Info.; PCTL=0x0
48: I_ASYNC : Alignment Synchronisation.
60: I_TRACE_INFO : Trace Info.; PCTL=0x0
65: I_TRACE_ON : Trace On.
66: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000000000400658; Ctxt: AArch64,EL0, NS;
76: I_ATOM_F3 : Atom format 3.; EEN
77: I_ATOM_F3 : Atom format 3.; ENN
78: I_ATOM_F5 : Atom format 5.; NEEEE
80: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
81: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
82: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEN
83: I_ATOM_F2 : Atom format 2.; NE
84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000400680;
93: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
94: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEN
96: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
97: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
98: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
99: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
100: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
101: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
102: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
103: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
104: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
105: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
106: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
107: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
108: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
109: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
110: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
112: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
113: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
114: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
115: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
116: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
117: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
118: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
119: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
120: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
121: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
122: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
123: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
124: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
125: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
126: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
128: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
129: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
130: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
131: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
132: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
133: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
134: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
135: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
136: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
137: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
138: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
139: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
140: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
141: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
142: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
144: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
145: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
146: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
147: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
148: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
149: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
150: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
151: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
152: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
153: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
154: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
155: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
156: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
157: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
158: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
160: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
161: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEN
162: I_ATOM_F2 : Atom format 2.; NE
163: I_ADDR_S_IS0 : Address, Short, IS0.; Addr=0x000000000040093C ~[0x93C]

0x630 [0x8]: event: 68
.
. ... raw event: size 8 bytes
. 0000: 44 00 00 00 00 00 08 00 D.......

0x630 [0x8]: PERF_RECORD_FINISHED_ROUND

Aggregated stats: (excludes AUX area (e.g. instruction trace) decoded / synthesized events)
TOTAL events: 11
MMAP events: 1
COMM events: 2
EXIT events: 1
MMAP2 events: 2
AUX events: 2
FINISHED_ROUND events: 1
AUXTRACE_INFO events: 1
AUXTRACE events: 1
cs_etm/@20010000.etf/u stats:
dummy:u stats:

3 Decode the trace data with python script

~/code/OpenCSD/trace-data$ cat disapp.sh

#!/bin/bash

export PERF_EXEC_PATH=xxx/perf-opencsd-4.8/tools/perf/
export EXEC_PATH=xxx/perf-opencsd-4.8/tools/perf/
export SCRIPT_PATH=$EXEC_PATH/scripts/python/
export XTOOL_PATH=xxx/gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/

perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump

./disapp.sh 

FILE: /test CPU: 1 FNAME: ~/.debug/test/054e3cdda6a6a75dd8376bfe48fbab64e8e0b81e/elf
400658: a9be7bfd stp x29, x30, [sp,#-32]!
40065c: 910003fd mov x29, sp
400660: b9001fbf str wzr, [x29,#28]
400664: b90017bf str wzr, [x29,#20]
400668: b9001bbf str wzr, [x29,#24]
40066c: 90000480 adrp x0, 490000 <tzfile_mtime>
400670: 91110000 add x0, x0, #0x440
400674: d2820002 mov x2, #0x1000 // #4096
400678: 52800001 mov w1, #0x0 // #0
40067c: 940052e1 bl 415200 <__memset>
FILE: /test CPU: 1 FNAME: ~/.debug/test/054e3cdda6a6a75dd8376bfe48fbab64e8e0b81e/elf
415200: aa0003e8 mov x8, x0
415204: 72001c27 ands w7, w1, #0xff
415208: 54000640 b.eq 4152d0
<__memset+0xd0>
….

Trace kernel

I also test the kernel trace with perf and the Coresight driver

On board:

/ # ./perf record -e cs_etm/@20010000.etf/k --filter 'start 0xffffff80080cd5d0,s
top 0xffffff80080cd694' --per-thread uname

On Host:

cat disker.sh

#!/bin/bash

export PERF_EXEC_PATH=xxxperf-opencsd-4.8/tools/perf/
export EXEC_PATH=xxx/perf-opencsd-4.8/tools/perf/
export SCRIPT_PATH=$EXEC_PATH/scripts/python/
export XTOOL_PATH=xxx/gcc-linaro-4.9-2015.05-x86_64_aarch64-linux-gnu/bin/

perf --exec-path=${EXEC_PATH} script --vmlinux=./vmlinux --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump -k ./vmlinux

./disker.sh

 

FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff80080cd5d0: a9bc7bfd stp x29, x30, [sp,#-64]!
ffffff80080cd5d4: 910003fd mov x29, sp
ffffff80080cd5d8: a90153f3 stp x19, x20, [sp,#16]
ffffff80080cd5dc: a9025bf5 stp x21, x22, [sp,#32]
ffffff80080cd5e0: f9001bf7 str x23, [sp,#48]
ffffff80080cd5e4: aa1e03e0 mov x0, x30
ffffff80080cd5e8: 900055f5 adrp x21, ffffff8008b89000 <nop_trace+0x10>
ffffff80080cd5ec: f00054b4 adrp x20, ffffff8008b64000 <cpu_worker_pools+0x180>
ffffff80080cd5f0: 911e0293 add x19, x20, #0x780
ffffff80080cd5f4: 97ff153f bl ffffff8008092af0 <_mcount>
FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff8008092af0: d65f03c0 ret
FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff8008092af4: d503201f nop
ffffff8008092af8: d503201f nop
ffffff8008092afc: d503201f nop
ffffff8008092b00: a9bf7bfd stp x29, x30, [sp,#-16]!
ffffff8008092b04: 910003fd mov x29, sp
ffffff8008092b08: d10013c0 sub x0, x30, #0x4
ffffff8008092b0c: f94003a1 ldr x1, [x29]
ffffff8008092b10: f9400421 ldr x1, [x1,#8]
ffffff8008092b14: d1001021 sub x1, x1, #0x4
ffffff8008092b18: d503201f nop
ffffff8008092b1c: d503201f nop
ffffff8008092b20: a8c17bfd ldp x29, x30, [sp],#16
ffffff8008092b24: d65f03c0 ret
FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff80080cd618: b9485260 ldr w0, [x19,#2128]
ffffff80080cd61c: 360803e0 tbz w0, #1, ffffff80080cd698 <scheduler_tick+0xc8>
FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff80080cd698: aa1303e0 mov x0, x19
ffffff80080cd69c: 97ffee43 bl ffffff80080c8fa8 <update_rq_clock.part.24>
FILE: [kernel.kallsyms] CPU: 2 FNAME: ./vmlinux
ffffff80080c8fa8: a9be7bfd stp x29, x30, [sp,#-32]!
ffffff80080c8fac: 910003fd mov x29, sp
ffffff80080c8fb0: f9000bf3 str x19, [sp,#16]
ffffff80080c8fb4: aa0003f3 mov x19, x0
ffffff80080c8fb8: aa1e03e0 mov x0, x30

ffffff80080c8fbc: 97ff26cd bl ffffff8008092af0 <_mcount>

Perf filter

Perf --filter supports address range trace. The only limitation with address filters is that the amount of address comparatives found on an implementation and the mutual exclusion between range and start stop filters

Filter format is: filter|start|stop|tracestop <start symbol or address> [/ <end symbol or size>] [@<file name>]

The demo application  disasm file, app & kernel trace instruction decode below:

https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-21-12/test.asm

https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-21-12/test.trace

https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-21-12/kernel.trace

For more information refer to a slides from Linaro on Hardware Assisted Tracing on Arm with CoreSight and OpenCSD

Information can also be found on Github at HOWTO - using the library with perf

Anonymous
  • Kaiyou Wang
    Kaiyou Wang over 5 years ago in reply to jeremy_ng

    Hi Jeremy, thanks for your kind reminder, I think now it is here https://github.com/Linaro/perf-opencsd

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • jeremy_ng
    jeremy_ng over 6 years ago

    Hi, this is a great tutorial. However, the command for  git clone -b perf-opencsd-4.8 github.com/.../OpenCSD.git perf-opencsd-4.8 is no longer working

    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
Tools, Software and IDEs blog
  • GitHub and Arm are transforming development on Windows for developers

    Pareena Verma
    Pareena Verma
    Develop, test, and deploy natively on Windows on Arm with GitHub-hosted Arm runners—faster CI/CD, AI tooling, and full dev stack, no emulation needed.
    • May 20, 2025
  • What is new in LLVM 20?

    Volodymyr Turanskyy
    Volodymyr Turanskyy
    Discover what's new in LLVM 20, including Armv9.6-A support, SVE2.1 features, and key performance and code generation improvements.
    • April 29, 2025
  • Running KleidiAI MatMul kernels in a bare-metal Arm environment

    Paul Black
    Paul Black
    Benchmarking Arm®︎ KleidiAI MatMul kernels on bare-metal with AC6, GCC, and ATfE compilers.
    • April 17, 2025