This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CMSIS DSP Correlation

Hello,

I'm working on a real-time treatment using a correlation on ARM Cortex-M4 (kinetis k60), so the duration of the treatment is important for me.

So I tried several size of input parameters and  different functions to be as fast as possible.

But there is something I don't understand:

I tried arm_correlate_fast_q31 and arm_correlate_q31.

According to the documentation, the 'fast' function should be faster but less accurate ?

I measured the times and it seems the basic correlation is faster than the fast.

Example in my case:

arm_correlate_fast_q31(din0, 1024, din1, 1024, dout) > 367 ms

arm_correlate_q31(din0, 1024, din1, 1024, dout) >          272 ms

I use the last version of CMSIS (V1.4.5 b) and GCC.

Are there something I did wrong or that I misunderstood ?

Thank you,

Romain

  • Hi Romain,

    I just wanted to make sure of one thing real quick.  Did you measure these functions without any potential interrupts to corrupt the timings?  Did you use the systick to measure how long the functions took (or some other method)?

    I will look into reproducing this on my end and try to figure it out.  Though I don't have an answer on hand.  These two functions were written by somebody else.  If it turns out 'fast' is slower than normal for ARMCC & GCC on M0, M3, M4 and M7, might just remove it. If it's only faster than normal on some architectures, it could still have a place in the library, I would just document under what situations it is indeed faster.

    Dan

  • Hello Romain,

    if you use the SysTick timer, I am afraid that the timer had overflowed.

    Could you try the small number of the array elements?

    Best regards,

    Yasuhiko Koumoto.

  • Thank you for your responses.

    It's a minimalist software (FreeRTOS + one high priority task performing the calculation).

    (I also tried to disable interrupts before enter in the function, same result).

    And about my method to measure, GPIO and oscilloscope.

  • Hi Romain et al,

    I find the same issue, M3 target using arm_fir_q31 versus arm_fir_fast_q31 with 48 coefficients.

    120 - 139uS   >>> Fast version

    96 - 115uS     >>> Normal version.

    I am using a system timer wrapped around the specific filter to measure. The filter is in the highest priority ISR, all other ISRs are lower priorities and no RTOS. It must run to completion without pre-emption.

    I'm reasonably sure this is accurate as when I replace the filter with a wait_us(50) function, I see a measured delay with the timer of 52uS.

    Interesting, frustrating. I was really hoping for an improvement.

    Aidan