Loop unrolling on Cortex-M3 vs. Cortex-M0

In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.

Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.

Thanks

Indu

Parents
  • Loop-unrolling is used in order to gain speed.

    This is accomplished by reducing the number of instructions that branch back in a loop, because such branches use clock cycles on doing no useful work.

    -But the cost is program-space.

    I believe the reason that the loops are not normally unrolled on the Cortex-M0, is that most Cortex-M0 microcontrollers have very little program space.

    But if you are using a LPC43xx, you will have plenty of program space, so you can change the settings, so loop-unrolling is enabled.

    Loop-unrolling will gain speed on all Cortex-M0, Cortex-M0+, Cortex-M3 and Cortex-M4 devices.

    However, I think that loop-unrolling will not gain any speed at all on Cortex-M7, because this architecture has a Branch Predictor and a Branch Target Address Unit, which I believe would make branches execute in zero clock cycles.

Reply
  • Loop-unrolling is used in order to gain speed.

    This is accomplished by reducing the number of instructions that branch back in a loop, because such branches use clock cycles on doing no useful work.

    -But the cost is program-space.

    I believe the reason that the loops are not normally unrolled on the Cortex-M0, is that most Cortex-M0 microcontrollers have very little program space.

    But if you are using a LPC43xx, you will have plenty of program space, so you can change the settings, so loop-unrolling is enabled.

    Loop-unrolling will gain speed on all Cortex-M0, Cortex-M0+, Cortex-M3 and Cortex-M4 devices.

    However, I think that loop-unrolling will not gain any speed at all on Cortex-M7, because this architecture has a Branch Predictor and a Branch Target Address Unit, which I believe would make branches execute in zero clock cycles.

Children
More questions in this forum