This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-M7 "zero overhead loop"

Hi.

In the page 22 of the document below informs that the cortex-m7 has "zero overhead loops" capability. I would like to know how it is done? Is there a special instruction for it?

http://community.arm.com/servlet/JiveServlet/downloadBody/9595-102-4-18606/ARM_Cortex_M7_MCU_Johnson.pdf

Ari.

Parents

0 Yasuhiko Koumoto over 10 years ago

Hello,

if say about the zero-overhead-loop, it means the loop instruction.
However, there are no such instructions in Thumb-2 ISA.
I think that Cortex-M7 zero-overhead-loop means just the branch prediction by BTAC (Branch Target Address Buffer).
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Yasuhiko Koumoto over 10 years ago

Hello,

if say about the zero-overhead-loop, it means the loop instruction.
However, there are no such instructions in Thumb-2 ISA.
I think that Cortex-M7 zero-overhead-loop means just the branch prediction by BTAC (Branch Target Address Buffer).
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Ari Mendes over 10 years ago in reply to Yasuhiko Koumoto

Yasuhiko,
If the "branch prediction by BTAC" is not present in the cortex-m4. You are probably right.
by the doc above, this enhancement is only present in the cortex-m7, not in the cortex-m4.

Thanks.
Ari.
Cancel
Vote up 0 Vote down

Cancel
0 Jens Bauer over 10 years ago in reply to Ari Mendes

The BTAC is only present in the Cortex-M7.
Unfortunately I do not have any hands-on experience regarding this, but I do have a few suggestions.
My suggestion is based upon experience with Cortex-M4 and other architectures; some which have out-of-order execution.
I think if jyiu is reading this, he can probably provide you with a much better answer.
Try placing your loop condition generation early in the loop and then the action after the condition and finally the branch at the end.
For instance:
copy_l:
    cmp     r2,r3
    ittt      lo
    ldrlo   r0,[r1],#4
    strlo   r0,[r2],#4
    blo     copy_l
Here I've included the branch in the IT instruction; it might help on getting a zero-overhead loop. Try comparing it with ...
copy_l:
    cmp     r2,r3
    itt       lo
    ldrlo   r0,[r1],#4
    strlo   r0,[r2],#4
    blo     copy_l
... and see if there is any difference.
For a counter-type loop, also keep the condition generation early; like this:
copy_l:
   subs    #1,r3
   ldr     r0,[r1],#4
   str     r0,[r2],#4
   bhs     copy_l
Cancel
Vote up 0 Vote down

Cancel
0 Joseph Yiu over 10 years ago in reply to Jens Bauer

Hi,
There is no new/special instruction for loops in Cortex-M7.
The design help reducing loop overhead in a number of ways:
- BTAC enable good accuracy in branch predction, so in most cases, there is no branch penalty (of course you still got branch penalty if the prediction is wrong)
- a branch instruction can execute at the same cycle with another data processing instruction
- moving the condition generation instruction eariler helps in some cases too (but in general the design of Cortex-M7 enable high performance without too much of compiler optimization).
So in strictly computer "geek" language, I won't call it zero-over-head loops . But the result is essentially same as some zero-overhead-loop designs, so in "PR" language this description is "correct".
And Ari is right that there is no BTAC in Cortex-M4.
regards,
Joseph
(Disclaimer : this message is written before my 2nd cup of coffee this morning....may not be suitable for human consumption).
Cancel
Vote up 0 Vote down

Cancel