I'm trying to measure performance of my code by using pmu. Code placed in EL1. To test pmu I created simple loop of couple operations. I did it under spinlock with disabled interrupts to prevent any preemption. Then I printed cycle counter to check how much cycles this my test code takes. But I see very different values at each print, ex: 100, 200, 10000, 50,
...
My question is: why output is so different? What cause this?
PS: in countrary to cycle counter, pmu's instructions counter is stable and I observe same output at each time.
Also I tried to use ARM timer, but it also showing different values similarly to pmu's cycles counter.
Android and secure OS are executed on cpu.
I tested it in S-EL1 on armv8 bigLITTLE which has 8 cpu cores.
CPU caches enabled, other cores run.
Here is code snippet how I measured performance by using ARM timer:
unsigned long long ticks_start, ticks_end; int i = 0, j; unsigned long flags;spin_lock_irqsave(&lock, flags); while (i++ < 1000) { j = 0; asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_start)); while (j++ < 10000) { asm volatile ("nop"); } asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_end)); printf("ticks %d are: %llu\n", i, ticks_end - ticks_start); } spin_unlock_irqrestore(&lock, flags);And output are:... ticks 31 are: 2287 ticks 32 are: 2287 ticks 33 are: 2287 ticks 34 are: 1984 ticks 35 are: 457 ticks 36 are: 1604 ticks 37 are: 2287 ...Can such behavior be result of cpu throttling?
You have so many things running that it is unlikely to get good figures for such a short loop.
Would you please elaborate more why I can't get good figures for such a short loop? This loop executes in atomic context on one core so I expected good figures.
You need to check if reading CNTPCT_EL0 is not trapped into the secure OS.
Actually I did measurements in Secure-EL1 in Secure OS kernel mode.
But the guest (non-secure) interrupts are still active? Maybe these cause the jitter? If not: I am out of ideas.
Android is still the Linux-like environment. The EL3 firmware may still respond to the GIC interrupts based on the GIC configuration so that your non-secure EL1 or secure EL1 may not be so accurate.
Suggest to run in bare-mental environment if you want to benchmark your loop code.