We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
If you are running Linux or some other "big OS" then it is likely that your application is getting interrupted while it is running (timer threads, interrupts, background processes, etc). The counters are global to the core so you may well find random "other stuff" runs between your counter reset and the counter sampling if you were pre-empted between the two.
Secondly, it is always worth double checking the disassembly to check the instruction ordering is looking sensible too. The "volatile" part in "asm volatile" ensures the compiler doesn't optimize it out, but doesn't provide any kind of compiler barrier / memory barrier semantics so the compiler might reorder instructions. With -O0 it is unlikely, but always worth checking. Sticking "memory" in the register clobber list should stop that happening.