This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Using NEON instructions to speed up cascaded biquads - how it works?

Former Member
Former Member

I am trying to understand how the cascaded biquad filtering is optimized for Arm processors in CMSIS using Neon extensions.
The code is ifdefed under `#if defined(ARM_MATH_NEON)` here: https://github.com/ARM-software/CMSIS_5/blob/develop/CMSIS/DSP/Source/FilteringFunctions/arm_biquad_cascade_df2T_f32.c

Documentation: arm-software.github.io/.../group__BiquadCascadeDF2T.html

The NEON intrinsics are used when there are more than 4 biquads cascaded. I am puzzled how could any kind of parallel instruction execution be done if output from one biduaq is fed as input to the next one? Could anyone explain what is done in parallel in that peace of code?