According to the manual, we can select PMU events as external input resource of ETM. However, the manual does not describe it in detail, and my attempts also failed.
To perform the experiment, i configure the following registers,
TRCCONFIGR -> 0x18c1 // Enable the return stack, global timestamping, Context ID, and Virtual context identifier tracing
TRCSYNCPR -> 0x8 // Enable trace synchronization every 256 bytes of trace.
TRCTSCTLR -> 0x0 // Disable the timestamp event
TRCTRACEIDR -> some id // set trace id
TRCVICTLR -> 0x6b201 // trace only non-secure EL0
TRCVIIECTLR -> 0x0 // no address range filter
TRCVIPCSSCTLR -> 0x0 // no start or stop points for ViewInst
TRCEXTINSELR -> 0x8 // select PMU event "INST_RETIRED" as external input 1
TRCRSCTLR2 -> 0x1 // select external input 1 for resource selector 2
TRCEVENTCTL0R -> 0x2 // select resource selector 2 to fire event 0
TRCEVENTCTL1R -> 0x1 // enable event 0 to generate event elements in the trace
However, the output of trace does not contain any event element. Did i misconfigured any register? Moreover, according to the manual of Cortex-A57, the PMU event "INST_RETIRED" has four different event number to trace, i.e. [11:8]. Is that mean i can select any of them as the external input for ETM?
Thanks for any help!
Regards,
Zhenyu
Regarding the PMUEVENT[11:8] bits: this is literally just a binary encoded number of instructions retired. Since the core runs much faster than the PMU event export can sample (and export), it's encoded as a 4-bit number and updated less frequently (as late as every 15 instructions). Later and faster processors might use larger fields (Cortex-A73 has 7 bits..). In that sense, though, it seems that it isn't useful in trace since the Event packet can only contain 4 event bits, and each event bit is a single signal to the trace unit, you may not get a complete picture in each packet as to what the number of instructions was.
Hi Matt,
Actually, i don't quite understand about PMUEVENT[11:8]. Do you mean that the four events PMUEVENT[8], PMUEVENT[9], PMUEVENT[10] and PMUEVENT[11] comprise a 4-bit number which indicates the number of retired instructions?
The experimental result looks weird. If I trace PMUEVENT[8] only, executing 3-4 instructions may lead to tens of events in the trace result. If I trace PMUEVENT[8], PMUEVENT[9], PMUEVENT[10], PMUEVENT[11] as different event in ETM (I mean enabling 4 events at the same time), the number of events are much less. But still i don't know how to build the relationship between the number of retired instructions and the events. Is there any manual which describing the relationship?
Thanks for your help! After i configured PMCR_EL0.X, i can get the events in ETM now.
Hi Zhenyu,
Yes, it means PMUEVENT[11:8] is literally a count from 0 to 15 of "instructions retired". What the ETM is counting is transitions on *each* pin, but it can't sample the pins as fast as the PMU can, so the actual events you get out from a trace perspective are bordering on useless. The ETM is not designed for high-frequency events like instruction counts (and has to even consider only tracing program flow changes vs. each individual instruction), and the routing of those PMU event outputs to the ETM is not ideal - it is hard to imagine a good way of counting, although by monitoring each of the 4 inputs and using a decrementing counter resource in the ETM, it might be possible to plot a histogram of how efficiently the CPU is executing over time.
The real use case that this gets used for, though, is monitoring cache events or TLB refills or suchlike - out of 110 pins on PMUEVENT for Cortex-A57, that's a lot of possibilities, even when you discount the multi-bit events. You could monitor a memory copy loop and the ETM would show each iteration of the loop, branches mispredicted, TLB refills, cache refills at L1 and L2. This can give you a good indication of resource problems like cache thrashing, effectiveness of preload instructions and so on. The second use case is that providing you use the ETM resources correctly you can extend the PMU counters by 2x16-bits - this gives you potential for higher counts by using the ETM for low-frequency events without the overflow/IRQ impact of managing the PMU in software.
Ta,
Matt
I actually looked into this last week
The trick is that the PMUEVENT pin numbering and the ETM input selection numbering are not the same! The decode for these event numbers is essentially that 0, 1, 2, 3 in TRCEXTINSELR correspond to the ETM external input bus (EXTIN[3:0]) which are usually wired to the CTIs. The PMU events are what used to be called the Extended Input in ETMv3 and is a wholly different resource selector there, but in ETMv4 they're combined into a single group. We recommend this layout in G.1 Recommended connection layout in the ETMv4 Architecture Specification.
So, add four - tracking PMUEVENT[8] is TRCEXTINSELR[SELn] = 12, for example. PMUEVENT[25] is TRCEXTINSELR[SELn] = 29.
You can only select 4 inputs (let us not call them events!) at a time through TRCEXTINSELR, and then map them to resources (TRCRSCTLRn with GROUP 0b0000 and SELECT being the extin selector number) - along with using TRCEVENTCTL0R.EVENTn (which maps the resource to an EVENTn) and TRCEVENTCTRL1R.INSTEN (enable by setting (1 << n)) fields, too, to make them visible in trace - one bit for each (see 6.4.10 Event tracing instruction trace packet). If you don't enable the packet with TRCEVENTCTRL1R.INSTEN or DATAEN, then you can still cause decrement counters, toggle the ViewInst function, and insert timestamps (TRCTSCTLR).
Don't forget that the PMUEVENT bus doesn't export signals unless enabled - PMCR_EL0.X=1 (0xE04 in the memory mapped view) will enable the bus so that the selected event signals will actually do something.
Thanks so much for your explain, i get it now!
View all questions in Arm Development Platforms forum