DSP concept guys say, that it's time to use ARM Cortex-M microcontrollers for embedded DSP systems, so I looked at CMSIS library of filtering functions, and found that it is of block type.
As you know, the most painful feature of ARM Cortex-M architecture is the lack of circular buffer addressing mode.
I cannot find an example of this functions application for continuous, real-time signal, because, as I guess, there is a big problem of input samples block gathering in a structure compatible with CMSIS FIR function. This should be done by a DMA controller, as we don't want to loose core clock, and this task is not easy. CMSIS FIR functions has internal state buffer which length equals to block_size+numOfTaps-1.
The function in multiple steps (=block_size/4) makes 4 samples copy from input buffer to state buffer (using core !!!), but after that, before next input block filtering the last numOfTaps-1 samples in state buffer must be moved to the beginning of this buffer.
It looks bad.
Maybe someone of you solved this problem and used this function in a real-time so, please, write me about that.
The CMSIS DSP FIR functions are designed for block-based operation with real-time signals. The function accepts pointers to a buffer of input samples and generates a buffer of output samples. You are right that the Cortex-M lacks circularly addressing. To mitigate this issue the function was written to use a FIFO rather than a circular buffer. Once per block, the data in the FIFO is shifted by blockSize samples. This requires that we read and write N words where N is the length of the filter. The overhead is roughly N/blockSize operations per sample. The function does all of the shifting of the FIFO data and accepts a continuous stream of input blocks.
many thanks for the kind answer.
Anyway, this function loses time for unnecessary transfers.
The first one is the copy of samples (four samples at time), from input buffer do FIR state buffer. Why not use last blockSize cells in the state buffer for this purpose (as an input buffer) ??
Possible conflict between CPU and DMA may be eliminated by using two, ping-pong state buffers. If one will use another DMA channel to move last N samples from stateA buffer to the beginning of stateB one, many core clock cycles may be saved for FIR convolutions.
I plan to modify this way CMSIS functions, and will inform you about results.
Hello, i could implement it in real time!!! i found the right way.
Any insight you could provide to that end, instead of just stating it works?