Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Use of smlad instruction in arm_fir_decimate_fast_q15

Hello ARM support team,

I hope you can help me.

I'm making use of your very nice DSP library, specifically the arm_fir_decimate_fast_q15 function. I see the convolution multiplications are implemented using the Dual Long multiply accumulate instruction, like this:

__ASM volatile ("smlad %0, %1, %2, %3" : "=r" (acc0) : "r" (x0), "r" (c0), "r" (acc0) );

However when I inspect the ASM code I see that each instance of smlad is only doing a single 16x16 bit multply and the upper words of the input registers are empty.

It seems the dual aspect of the smlad instruction is waisted and I get the same performace when I substitute a regular 16x16 bit multiply. This is a problem for me as I could really do with the performance gain of a dual multiply.

Can you please confrim whether this is the expected behaviour? And if so, what is the reason this function can't take advantage of dual multiply?

My system details:

CPU: Cortex -M33

MCU: STM32U3

Compiller: GNU 13.3 (STM32Cube IDE 1.18)

Optimisation: -Ofast

Many thanks!

John