All,
When I am using the cycle counter in AArch64, I am not getting cycles properly. I have enabled read of pmccntr_el0 in user space using a small kernel module. I have sample code like:
asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(prev)); sleep(1); asm volatile("isb;mrs %0, pmccntr_el0" : "=r"(curr)); delta = curr-prev;
I expected delta to be in the range of 1400000000 as a57 in our design runs at 1400MHz
But I am getting around 32100000 which means the cycle counter frequency is ~3.21MHz
The value of Control register is pmcr=41013001 indicating divider is off.
With Generic timer counter registers, I am getting the values as expected. The below code gives
asm volatile ("isb; mrs %0, cntvct_el0" : "=r" (ts)); sleep (2); asm volatile ("isb; mrs %0, cntvct_el0" : "=r" (te)); asm volatile ("isb; mrs %0, cntfrq_el0" : "=r" (freq)); printf ("Aarch64 %20ld cycles\n", (unsigned long long)(te - ts)); printf (" Frequency = %u\n",freq);
I get count of 512021629 cycles for 2 sec as expected for 256MHz frequency which I got from cntfrq_el0.
cntfrq_el0.
Is there something basic I am missing for PMCCNTR_EL0?
PMCCNTR_EL0?
thanks and regards,
Ravi
Matt,
Thanks for the explanation. I may now stick to values from generic timer. In one of the networking applications, we need to do some job periodically at 100us or less. Instead of depending on the timer interrupt because of latency involved, we poll the "cycle counter" in a loop. Once we hit the required count, we will do the periodic job. We were using rdtsc in X86 (tsc in x86 increments at a constant rate based on max frequency of the core) and on ARM we wanted to use similar mechanism. Hence tried PMCCNTR_EL0. As it is clear now from your/Martin's explanation that it is not a real "time stamp" counter, now I will use timer counters. At 50MHz, each tick of generic timer counter will be 20ns, I hope that shall be OK for now.
regards,
Hi Ravi,
You mentioned the counter seemed to run at 256MHz - so you should be getting resolution of 4ns or thereabouts. Note that it will probably take far longer than 4ns to actually poll the counter -- at the CPU frequency you state, if you can read the Generic Timer virtual counter (with the ISB to ensure that it is not speculatively read) and do the comparison and branch in ~6 cycles, you'll get your Timer 4ns resolution - the polling itself, however, doesn't seem feasible to meet that resolution. Since you only need 100us, it should make no difference whatsoever,
Again, the curious question is what do you need that kind of polling and timer resolution for? One would assume you have this event that needs to run at least every 100us, but the processing required takes very close to 100us, otherwise you would not be so concerned about meeting the timing or the some-thousands of cycles interrupt latency. But the constant ISB-before-read is going to really hurt system performance. Is it possible that some other system effect of the processing is causing higher interrupt latency (copious use of STM/LDM instructions, other uses of ISB, that will add cycles to latency)?
Ta,
Matt