I asked a question about the PMU skid problem about half years ago (https://community.arm.com/tools/dev-platforms/f/discussions/6986/about-the-interrupt-handling-in-gic), and i understand that the skid exists as the interrupt is an asynchronous exception. However, i found something interesting about the skid, but i cannot explain it.
Assume that we have a list of instruction I1 to I10, and a PMU overflow interrupt is signaled after the instruction I1 is executed. When the interrupt actually fires to the core from GIC, the instruction I10 may be the executing instruction and the instructions I2 to I9 are in the skid shadow. In my understanding, the number of the instruction in the shadow should be almost fixed as the time needed for a interrupt request to be handled in GIC is almost fixed. I tried to use several different PMU event to trigger the overflow, and the result supports my guess.
However, when i tried to use PMU event CPU_CYCLES to trigger the overflow interrupt, things changed. Firstly, the overflow interrupt may be fired several times while executing a single instruction. I understand that the execution of an instruction may last for several CPU cycles, but how could the interrupt arrives to the CPU for several times in the execution of a single instruction? Where is the skid of these interrupts? Secondly, I found that the skid still exist, but the average number of the instruction in the skid shadow reduced. For example, during the execution of the instruction I1, we may receive several interrupts in the interrupt handle. Then, when the next interrupt arrives, the executing instruction is I5. In my test on Juno board, the average number of instruction in the skid is about 18 instructions, but the number suddenly reduced to about 2 instructions when I use the CPU_CYCLES event. This result makes me confused, does anyone know what has happened?
Thanks very much for any help!
Hi Zhenyu, Can you describe your code and which PMU counter numbers overflow, and their configuration? Are you enabling the dedicated cycle counter as well as the programmable counters? Is it possible you're seeing the 'duplicate' interrupt as both the Cycle counter and your programmable counter overflowing around the same time? Are you reading PMOVSCLR_EL0 to find out which counter overflows or just using the interrupt number? The interrupt number will only tell you that *a counter* overflowed. You need to read the status register to see which individual counter actually overflowed and make your decision based on that. If you have the PMCCNTR enabled or any other counters, 'frequency of interrupt N' is not a metric you can gain any information from. One note, though, the PMU is not designed to measure the things you're trying to measure. It is a statistical sampling method and will give you a count of events which, if averaged over several runs, will give you a good indication of particular behaviours. You cannot count cycles taken to decode or execute a particular instruction and there's no guarantee of synchronization within the PE pipeline. It also makes no sense whatsoever to configure a programmable counter (PMXEVSEL/CNTR<n>) to count the CPU_CYCLES event since PMCCNTR already does this for you without configuration and is already a 64-bit count (therefore it will not overflow so often). Interrupts from any source are also, architecturally, asynchronous events. Daith is correct in that they are taken at instruction retire stages of the pipeline, but they are not attributable to a particular instruction retiring (except by way of the one after the interrupt is signalled via nIRQ, this allows the core to fill ELR_ELx with an appropriate preferred return address). The PMU overflow is not handled any differently - it is a peripheral in the ARM architecture, with the same rules as any other peripheral.
Thanks,
Matt
I'm no expert on this but it sounds to me like the problems here are because of where the interrupt occurs in the pipeline. The overflow one will be triggered when instructions are retired - so there are a lot of instructions still in the pipeline. The cycles one can be triggered straight away as the instructions are being decoded. Don't know what the multiple interrupts business is about.
View all questions in Arm Development Platforms forum