This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How many clock cycles do SVC/PUSH/POP/SRS/RFE insturctions take to execute on Cortex-A8 processor?

I'm trying to count the cycle timing of my program in hand.

I read the ARM Cortex-A8 R3P1 Technical Reference Manual: Chapter 16. Instruction Cycle Timing, but I couldn't find the cycle timings of SVC/PUSH/POP/SRS/RFE.

I want to know how many clock cycles does take to execute these instuctions.

  • Hi jjy,

    SVC is a supervisor call, so it takes as long as it takes to move to SVC mode and synchronize processor context (i.e. discard pipeline contents and copy CPSR to SPSR_svc, PC to LR_svc, branch to the SVC vector..). The time from fetch to finishing an event like this is somewhat variable -- just like a branch execution can be variable due to effects of the instruction cache, and your SVC vector is explicitly going to have to be a direct branch or a load-based branch. If it misses and needs to fetch the SVC handler vector from L3, it will take a long time. If your SVC handler referenced by the SVC vector is not within the same cache line then it may miss again..

    PUSH, POP will vary dependent on the amount of data you want to move to or from the stack and where it is -- L1, L2, L3 cache or memory. Both are just aliases to STMFD/LDMFD in ARM, but have the same effect in Thumb2, so they should have identical timings (2 <= (num_registers/2)) < (num_registers/2+load_time)) . SRS and RFE only store and retrieve a limited amount of data from the stack and have the same variance. RFE is also a context synchronizing event so it has a variable time to finish.

    The point here is that the cycle timing on most instructions that do not just operate on a couple registers and give a result don't really have any bearing in reality. Outside of data processing, the time it takes to "execute" an instruction is totally dependent on the current state of the pipeline and load/store subsystems.

    You might say that they all take one cycle to execute, but they have additional and variable latency caused by the memory subsystem. The easiest way to determine how long it takes to execute is to get some ETM trace -- turn on Cycle Accurate Trace and see what the counts are at each instruction boundary, since memory is involved you may have to run several rounds and average the results.

    Ta,

    Matt