I'm trying to measure performance of my code by using pmu. Code placed in EL1. To test pmu I created simple loop of couple operations. I did it under spinlock with disabled interrupts to prevent any preemption. Then I printed cycle counter to check how much cycles this my test code takes. But I see very different values at each print, ex: 100, 200, 10000, 50,
...
My question is: why output is so different? What cause this?
PS: in countrary to cycle counter, pmu's instructions counter is stable and I observe same output at each time.
Also I tried to use ARM timer, but it also showing different values similarly to pmu's cycles counter.
You have so many things running that it is unlikely to get good figures for such a short loop.
Would you please elaborate more why I can't get good figures for such a short loop? This loop executes in atomic context on one core so I expected good figures.
You need to check if reading CNTPCT_EL0 is not trapped into the secure OS.
Actually I did measurements in Secure-EL1 in Secure OS kernel mode.
But the guest (non-secure) interrupts are still active? Maybe these cause the jitter? If not: I am out of ideas.