Hello,
I've been struggling to understand an issue related to cycle counting on both Cortex-M and Cortex-A profile processors.
For example, on a Cortex-A53 processor, a simple MOV or ADD instruction takes around 17 cycles according to the Performance Monitor Counters (PMC), even though instruction caches are enabled and the instruction is already cached. Could this be due to pipeline behavior?
MOV
ADD
Additionally, when executing a sequence of data-dependent MOV instructions in single-step (debug) mode, the block takes approximately 200 (I made up this number but you understood the point, I guess.) cycles to complete. However, when the processor is allowed to run freely (outside debug mode), the same block executes in just 13 cycles.
I've observed similar behavior on M-profile processors as well.
Could you please help me understand how the pipeline behaves during single-stepping, or why a single instruction might appear to consume so many cycles?
Thank you in advance!
Sahin