This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

PMU's cycles counter showing unstable values

I'm trying to measure performance of my code  by using pmu. Code placed in EL1. To test pmu I created simple loop of couple operations. I did it under spinlock with disabled interrupts to prevent any preemption. Then I printed cycle counter to check how much cycles this my test code takes. But I see very different  values at each print, ex: 100, 200, 10000, 50, 

...

My question is: why output is so different? What cause this?

PS: in countrary to cycle counter,  pmu's instructions counter is stable and I observe same output at each time.

Also I tried to use ARM timer, but it also showing different values similarly to pmu's cycles counter.

Parents
  • Android and secure OS are executed on cpu.

    I tested it in S-EL1 on armv8 bigLITTLE which has 8 cpu cores.

    CPU caches enabled, other cores run.

    Here is code snippet how I measured performance by using ARM timer:

    unsigned long long ticks_start, ticks_end; 
    int i = 0, j;
    unsigned long flags;

    spin_lock_irqsave(&lock, flags);
    while (i++ < 1000) {
    j = 0;
    asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_start));
    while (j++ < 10000) {
    asm volatile ("nop");
    }
    asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_end));
    printf("ticks %d are: %llu\n", i, ticks_end - ticks_start);
    }
    spin_unlock_irqrestore(&lock, flags);

    And output are:

    ...
    ticks 31 are: 2287
    ticks 32 are: 2287
    ticks 33 are: 2287
    ticks 34 are: 1984
    ticks 35 are: 457
    ticks 36 are: 1604
    ticks 37 are: 2287
    ...


    Can such behavior be result of cpu throttling?
Reply
  • Android and secure OS are executed on cpu.

    I tested it in S-EL1 on armv8 bigLITTLE which has 8 cpu cores.

    CPU caches enabled, other cores run.

    Here is code snippet how I measured performance by using ARM timer:

    unsigned long long ticks_start, ticks_end; 
    int i = 0, j;
    unsigned long flags;

    spin_lock_irqsave(&lock, flags);
    while (i++ < 1000) {
    j = 0;
    asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_start));
    while (j++ < 10000) {
    asm volatile ("nop");
    }
    asm volatile("isb; mrs %0, CNTPCT_EL0" : "=r" (ticks_end));
    printf("ticks %d are: %llu\n", i, ticks_end - ticks_start);
    }
    spin_unlock_irqrestore(&lock, flags);

    And output are:

    ...
    ticks 31 are: 2287
    ticks 32 are: 2287
    ticks 33 are: 2287
    ticks 34 are: 1984
    ticks 35 are: 457
    ticks 36 are: 1604
    ticks 37 are: 2287
    ...


    Can such behavior be result of cpu throttling?
Children