This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M7 "zero overhead loop"

Hi.

In the page 22 of the document below informs that the cortex-m7 has "zero overhead loops" capability. I would like to know how it is done? Is there a special instruction for it?

http://community.arm.com/servlet/JiveServlet/downloadBody/9595-102-4-18606/ARM_Cortex_M7_MCU_Johnson.pdf

Ari.

Parents
  • Hello,


    if say about the zero-overhead-loop, it means the loop instruction.
    However, there are no such instructions in Thumb-2 ISA.
    I think that Cortex-M7 zero-overhead-loop means just the branch prediction by BTAC (Branch Target Address Buffer).

    Best regards,
    Yasuhiko Koumoto.

Reply
  • Hello,


    if say about the zero-overhead-loop, it means the loop instruction.
    However, there are no such instructions in Thumb-2 ISA.
    I think that Cortex-M7 zero-overhead-loop means just the branch prediction by BTAC (Branch Target Address Buffer).

    Best regards,
    Yasuhiko Koumoto.

Children
  • Yasuhiko,

       If the "branch prediction by BTAC" is not present in the cortex-m4. You are probably right.

      by the doc above, this enhancement is only present in the cortex-m7, not in the cortex-m4.

     

      Thanks.

       Ari.

  • The BTAC is only present in the Cortex-M7.

    Unfortunately I do not have any hands-on experience regarding this, but I do have a few suggestions.

    My suggestion is based upon experience with Cortex-M4 and other architectures; some which have out-of-order execution.

    I think if jyiu is reading this, he can probably provide you with a much better answer.

    Try placing your loop condition generation early in the loop and then the action after the condition and finally the branch at the end.

    For instance:

    copy_l:

        cmp     r2,r3

        ittt      lo

        ldrlo   r0,[r1],#4

        strlo   r0,[r2],#4

        blo     copy_l

    Here I've included the branch in the IT instruction; it might help on getting a zero-overhead loop. Try comparing it with ...

    copy_l:

        cmp     r2,r3

        itt       lo

        ldrlo   r0,[r1],#4

        strlo   r0,[r2],#4

        blo     copy_l

    ... and see if there is any difference.

    For a counter-type loop, also keep the condition generation early; like this:

    copy_l:

       subs    #1,r3

       ldr     r0,[r1],#4

       str     r0,[r2],#4

       bhs     copy_l


  • Hi,

    There is no new/special instruction for loops in Cortex-M7.

    The design help reducing loop overhead in a number of ways:

    - BTAC enable good accuracy in branch predction, so in most cases, there is no branch penalty (of course you still got branch penalty if the prediction is wrong)

    - a branch instruction can execute at the same cycle with another data processing instruction

    - moving the condition generation instruction eariler helps in some cases too (but in general the design of Cortex-M7 enable high performance without too much of compiler optimization).

    So in strictly computer "geek" language, I won't call it zero-over-head loops . But the result is essentially same as some zero-overhead-loop designs, so in "PR" language this description is "correct".

    And Ari is right that there is no BTAC in Cortex-M4.

    regards,

    Joseph

    (Disclaimer : this message is written before my 2nd cup of coffee this morning....may not be suitable for human consumption).