In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.
Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.
Thanks
Indu
daith wrote: The main advantage of loop unrolling is to schedule the memory accesses better.
daith wrote:
The main advantage of loop unrolling is to schedule the memory accesses better.
Yes this is true; I didn't think about that, because the question was about Cortex-M0, where scheduling of LDR instructions won't matter.
Still, it's possible to merge memory access on the Cortex-M0, which in some cases can change the task from being impossible to being possible.