We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.
Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.
Thanks
Indu
daith wrote: The main advantage of loop unrolling is to schedule the memory accesses better.
daith wrote:
The main advantage of loop unrolling is to schedule the memory accesses better.
Yes this is true; I didn't think about that, because the question was about Cortex-M0, where scheduling of LDR instructions won't matter.
Still, it's possible to merge memory access on the Cortex-M0, which in some cases can change the task from being impossible to being possible.