We have some code that sets up various event counters and reads them. We bracket this code with reads of the cycle counter. We have noticed that depending on what event counter we are configuring, we see widely different cycle counts when running the same piece of code. Here is some psuedo-code for what we are attempting:
Configure event X to PMU counter 0
read cycle counter
Call test_code
Read cycle counter
Read PMU counter 0
We see by the cycle counter that it takes more cycles to execute our test_code when we configure event 4 than say event 5
Is this to be expected? What causes this?
This is running on a Cortex-A8 on bare-hardware.
Also, The TRM was unclear about how expensive reading these counters are. At one point, the TRM says MRC/MCR takes a minimum of 60 cycles, but it isn't clear if that is referring to all or a subset of the instructions.
Yes, there is a range. When event 5 is configured the delta between highest and lowest is 500 cycles. Event 4 it is 130 cycles and event 0xD it is 175 cycles. This is not what I am questioning.
In absolute terms there is a difference of almost 700 cycles between event 5 and 4. I would have expected the cost of reading the PMU count register to have less of a variation dependent on event number.
Don't forget I also asked about the cycle count for the basic operation of reading these registers.
Thanks