I am currently porting my 32-bit C++ DSP audio processing project on Analog Devices Sharc DSP processor to 64-bit processing on Raspberry A53 AArch64, my target platform is Raspberry Pi, 3B+ maybe 4. In order to implement IIR filtering I want to use arm_biquad_cascade_df2T_f64() together with a supplementary init function that implements the state array needed to process data in a block-based manner. it seems to work with 64-bit. But I have doubts if they are suitable and optimized for AArch64 as generally CMSIS is labeled 32-bit and there is a folloiwng statement in the documentation:
"For Neon version, this array is bigger. If numstages = 4x + y, then the array has size: 32*x + 5*y and it must be initialized using the function arm_biquad_cascade_df2T_compute_coefs_f32 which is taking the standard array coefficient as parameters."
So there is a 32-bit function and no trace of 64-bit for Neon.
My application is quite performance critical so I would like to learn if CMSIS functions will work and are optimized for AArch64 ?
The short answer is that the CMSIS-DSP functions will work, but largely aren't optimised specifically for AArch64 yet. AArch64 optimisation is on the to-do list, but I don't think it is being worked on currently.
Your example, arm_biquad_cascade_df2T_f64() in CMSIS-DSP is written in pure C. There are no Neon Intrinsics for this one. So the performance will depend on what the compiler is able to do (auto-vectorization), and biquad filters are quite tricky to auto-vectorize.
The comment you quote is for the f32 version of the function and doesn't apply to the f64 version.
If you wish to implement some functions yourself in Neon to improve 64-bit performance, I can offer you the Neon Programmers guide for AArch64 and the more general Neon Programmers Guide. The search of the Neon Intrinsics instruction set is very useful.
I hope this helps,
View all questions in Machine Learning forum