Support forums

Keil forum CMSIS IIR filter calculation speed

State Not Answered
+1 person also asked this people also asked this
Locked Locked
Replies 2 replies
Subscribers 23 subscribers
Views 8015 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CMSIS IIR filter calculation speed

MCL over 7 years ago

Hello!

I am trying to implement IIR filter algorithms on an STM32F767. I'm using the CMSIS library for the filter algorithms and they are working as expected.

However, the execution speed is very low and I'm not sure why this is happening?

I'm calculating the output samples from the input and output (circular) buffer with coefficients and data structures in place as recommended in the CMSIS documentation. Calculating a single sample takes around 285 clock cycles for the floating point implementation, and around 580 if I use fixed point data.

When thinking about a 2nd order IIR sample calculation there should be five multiplications and some rounding/shifting so I am very surprised about the big cycle count. I would expect something in the low two digit figures in terms of cycle count.

For cycle counting I am using the DWT_CYCCNT feature and a subtraction to display the acual counts used for a certain function calls/code snippets.

I'm using a sampling frequency of 48kHz. A 12.288MHz external clock gets the core running at 215.808MHz. The data comes in from a pair of 8ch ADC's via DMA upon SAI interrupt and is sent out via DMA as well.

I want to have faster calculation because the end application will use 16Ch of audio passing thru the µC and should apply filtering to each of them.

If I had to make a guess about the source of the problem, I would suspect that the data I am working with takes long to reach the FPU and is therefore slowing down the operation. However, I would be surprised if a CMSIS function would not be optimized in this regard.

While I'm aware that writing assembly code for that calculation would probably speed things up, I'm trying not to go that deep if not absolutely necessary.

Is there someone here who has made this experience and/or who could point me to the source of the problem?

Thank you very much in advance!

Michele