Hi, I am working on a project doing realtime FFT processing on an audio signal using an STM32H730 processor with the CMSIS FFT library. Currently trying to reduce latency through lower block size and higher overlap between FFTs, and looking for ways to optimize and reduce processing overhead in the FFT function.
I wanted to ask whether performance could be improved by replacing the arm_rfft_f32 function with the arm_rfft_f16 function? Would this be faster on the STM32H730 hardware? And would the fixed point functions like arm_rfft_q31 be faster still?
Please let me know. Also open to any other advice on how to optimize CMSIS FFT processing for SMT32H730 hardware. Thanks!