I'm trying to measure performance of my code by using pmu. Code placed in EL1. To test pmu I created simple loop of couple operations. I did it under spinlock with disabled interrupts to prevent any preemption. Then I printed cycle counter to check how much cycles this my test code takes. But I see very different values at each print, ex: 100, 200, 10000, 50,
...
My question is: why output is so different? What cause this?
PS: in countrary to cycle counter, pmu's instructions counter is stable and I observe same output at each time.
Also I tried to use ARM timer, but it also showing different values similarly to pmu's cycles counter.
Baremetal? Other cores stopped?Try running it with caches disabled.
Android and secure OS are executed on cpu.
I tested it in S-EL1 on armv8 bigLITTLE which has 8 cpu cores.
CPU caches enabled, other cores run.
Here is code snippet how I measured performance by using ARM timer:
unsigned long long ticks_start, ticks_end; int i = 0, j; unsigned long flags;spin_lock_irqsave(&lock, flags); while (i++ < 1000) { j = 0; asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_start)); while (j++ < 10000) { asm volatile ("nop"); } asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_end)); printf("ticks %d are: %llu\n", i, ticks_end - ticks_start); } spin_unlock_irqrestore(&lock, flags);And output are:... ticks 31 are: 2287 ticks 32 are: 2287 ticks 33 are: 2287 ticks 34 are: 1984 ticks 35 are: 457 ticks 36 are: 1604 ticks 37 are: 2287 ...Can such behavior be result of cpu throttling?
You have so many things running that it is unlikely to get good figures for such a short loop.
Would you please elaborate more why I can't get good figures for such a short loop? This loop executes in atomic context on one core so I expected good figures.
You need to check if reading CNTPCT_EL0 is not trapped into the secure OS.
Actually I did measurements in Secure-EL1 in Secure OS kernel mode.
But the guest (non-secure) interrupts are still active? Maybe these cause the jitter? If not: I am out of ideas.
Android is still the Linux-like environment. The EL3 firmware may still respond to the GIC interrupts based on the GIC configuration so that your non-secure EL1 or secure EL1 may not be so accurate.
Suggest to run in bare-mental environment if you want to benchmark your loop code.