CMSIS DSP Library FIR filter in realtime application

Hello,

I am new to DSP things and I just built my first test project. This is a low pass filter with an order of 31, so 32 coefficients. Also I have an input buffer of 32 values that I designed as a ring buffer. With these I programmed a standard FIR filter algorithm in C. The input buffer is filled with one new value periodically at sample frequency with the output of an ADC. Then I run one filter calculation and get an output for this point in time. Filter output goes to an DAC. All this runs fine for me, feeding the system with a frequency sweep I can see on an oscilloscope the low pass function at DAC output signal.

Now I want to change this to CMSIS FIR functions to increase speed and test Q15 and Q31 types, but I struggle with my realtime approach with one new value for every calculation run. All the examples I found calculate a large number of stored samples and create a complete waveform output instead of just the current output value. What do I have to setup?

Is blocksize = 1? So *pSrc and *pDst point to only one value? Will the previous values remain stored in pState of arm_fir_instance_f32? How is the ring buffer realized or is it all in the CMSIS functions?

Then I found an STM32 application note (AN4841) stating that the fastest FIR calculation is with float32, followed by Q31, followed by Q15. I wonder how this can be. Don't the Q31 and Q15 use MAC instructions? Or are there the same instructions also in the FPU unit?

Thanks for any help

Martin