This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How many clock cycles does a"for" loop take?

Hello,

I work with cortex M3-Microcontroller (LPC1768) and I want to know how much clock has a loop (for)

for(i=0;i<1;i++);

Thanks

Parents
  • Former Member
    0 Former Member

    Using Godbolt's online compiler explorer (set to gcc 9.2.1 although the clang output is similar enough) you will find with no optimization:

            movs    r3, #0
            str     r3, [r7, #4]
    .loop:
            ldr     r3, [r7, #4]
            cmp     r3, #0
            bgt     .loopexit
            ldr     r3, [r7, #4]
            adds    r3, r3, #1
            str     r3, [r7, #4]
            b       .loop
    .loopexit:
            ...

    With any optimization, the loop is removed as it has no effect other than if used for timing.

    So then, you'd need to look up the cycle count for each instruction, which is in the technical reference manual (TRM).

    • MOV/CMP/ADD are 1 cycle
    • STR/LDR are 2 cycles
    • Branches are 1 cycle with a pipeline reload if the branch is taken (adding 2 cycles)
      The TRM references unconditional branches with and without stalls, but I don't really know which happens; 1 cycle or 3. Some professor somewhere says 3.

    This would loop once, so ...
    1 + 2 + 2 + 1 + 1 (branch not taken) + 2 + 1 + 2 + 3 (branch taken) + 2 + 1 + 3 (branch taken)
    In other words, 21 instruction cycles without optimization if I'm not missing anything (like byte alignment or anything else). Just glancing at the clang output, it looks like it has one more taken branch and would thus take 24 instruction cycles.

    Still, Andy has the right answer. It depends, and without an instruction that has an effect (like changing a pin output) the optimizer will erase the loop. Even with useful code in the loop, you'd probably need greater than 1 loop to see meaningful differences between the different optimization levels.

Reply
  • Former Member
    0 Former Member

    Using Godbolt's online compiler explorer (set to gcc 9.2.1 although the clang output is similar enough) you will find with no optimization:

            movs    r3, #0
            str     r3, [r7, #4]
    .loop:
            ldr     r3, [r7, #4]
            cmp     r3, #0
            bgt     .loopexit
            ldr     r3, [r7, #4]
            adds    r3, r3, #1
            str     r3, [r7, #4]
            b       .loop
    .loopexit:
            ...

    With any optimization, the loop is removed as it has no effect other than if used for timing.

    So then, you'd need to look up the cycle count for each instruction, which is in the technical reference manual (TRM).

    • MOV/CMP/ADD are 1 cycle
    • STR/LDR are 2 cycles
    • Branches are 1 cycle with a pipeline reload if the branch is taken (adding 2 cycles)
      The TRM references unconditional branches with and without stalls, but I don't really know which happens; 1 cycle or 3. Some professor somewhere says 3.

    This would loop once, so ...
    1 + 2 + 2 + 1 + 1 (branch not taken) + 2 + 1 + 2 + 3 (branch taken) + 2 + 1 + 3 (branch taken)
    In other words, 21 instruction cycles without optimization if I'm not missing anything (like byte alignment or anything else). Just glancing at the clang output, it looks like it has one more taken branch and would thus take 24 instruction cycles.

    Still, Andy has the right answer. It depends, and without an instruction that has an effect (like changing a pin output) the optimizer will erase the loop. Even with useful code in the loop, you'd probably need greater than 1 loop to see meaningful differences between the different optimization levels.

Children