This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Loop unrolling on Cortex-M3 vs. Cortex-M0

In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.

Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.

Thanks

Indu

Parents

0 Jens Bauer over 11 years ago

Loop-unrolling is used in order to gain speed.
This is accomplished by reducing the number of instructions that branch back in a loop, because such branches use clock cycles on doing no useful work.
-But the cost is program-space.
I believe the reason that the loops are not normally unrolled on the Cortex-M0, is that most Cortex-M0 microcontrollers have very little program space.
But if you are using a LPC43xx, you will have plenty of program space, so you can change the settings, so loop-unrolling is enabled.
Loop-unrolling will gain speed on all Cortex-M0, Cortex-M0+, Cortex-M3 and Cortex-M4 devices.
However, I think that loop-unrolling will not gain any speed at all on Cortex-M7, because this architecture has a Branch Predictor and a Branch Target Address Unit, which I believe would make branches execute in zero clock cycles.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Jens Bauer over 11 years ago

Loop-unrolling is used in order to gain speed.
This is accomplished by reducing the number of instructions that branch back in a loop, because such branches use clock cycles on doing no useful work.
-But the cost is program-space.
I believe the reason that the loops are not normally unrolled on the Cortex-M0, is that most Cortex-M0 microcontrollers have very little program space.
But if you are using a LPC43xx, you will have plenty of program space, so you can change the settings, so loop-unrolling is enabled.
Loop-unrolling will gain speed on all Cortex-M0, Cortex-M0+, Cortex-M3 and Cortex-M4 devices.
However, I think that loop-unrolling will not gain any speed at all on Cortex-M7, because this architecture has a Branch Predictor and a Branch Target Address Unit, which I believe would make branches execute in zero clock cycles.
Cancel
Vote up 0 Vote down

Cancel

Children

0 daith over 11 years ago in reply to Jens Bauer

The main advantage of loop unrolling is to schedule the memory accesses better. There can also be savings in regsiter moves and branches but they're normally secondary. Sometimes also one can merge memory accesses and save a bit that way.
Cancel
Vote up 0 Vote down

Cancel
0 Jens Bauer over 11 years ago in reply to daith

daith wrote:

The main advantage of loop unrolling is to schedule the memory accesses better.

Yes this is true; I didn't think about that, because the question was about Cortex-M0, where scheduling of LDR instructions won't matter.
Still, it's possible to merge memory access on the Cortex-M0, which in some cases can change the task from being impossible to being possible.
Cancel
Vote up 0 Vote down

Cancel