In the DSP lib files like arm_conv_f32, arm_fir_f32, the algorithm implementation in Cortex-M3/M4 and in Cortex-M0 is different. i.e., loop unrolling is used in M3/M4 and it is not used in M0.
Pls tell me the reason behind it. Is there any advantage of using loop unrolling in M3/M4.
Thanks
Indu