This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Loop unrolling on Cortex-M3 vs. Cortex-M0

In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.

Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.

Thanks

Indu

Parents

0 daith over 11 years ago in reply to Jens Bauer

The main advantage of loop unrolling is to schedule the memory accesses better. There can also be savings in regsiter moves and branches but they're normally secondary. Sometimes also one can merge memory accesses and save a bit that way.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 daith over 11 years ago in reply to Jens Bauer

The main advantage of loop unrolling is to schedule the memory accesses better. There can also be savings in regsiter moves and branches but they're normally secondary. Sometimes also one can merge memory accesses and save a bit that way.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Jens Bauer over 11 years ago in reply to daith

daith wrote:

The main advantage of loop unrolling is to schedule the memory accesses better.

Yes this is true; I didn't think about that, because the question was about Cortex-M0, where scheduling of LDR instructions won't matter.
Still, it's possible to merge memory access on the Cortex-M0, which in some cases can change the task from being impossible to being possible.
Cancel
Vote up 0 Vote down

Cancel