Keil forum Use of smlad instruction in arm_fir_decimate_fast_q15

Use of smlad instruction in arm_fir_decimate_fast_q15

John Atkins 4 months ago

Hello ARM support team,

I hope you can help me.

I'm making use of your very nice DSP library, specifically the arm_fir_decimate_fast_q15 function. I see the convolution multiplications are implemented using the Dual Long multiply accumulate instruction, like this:

__ASM volatile ("smlad %0, %1, %2, %3" : "=r" (acc0) : "r" (x0), "r" (c0), "r" (acc0) );
However when I inspect the ASM code I see that each instance of smlad is only doing a single 16x16 bit multply and the upper words of the input registers are empty.
It seems the dual aspect of the smlad instruction is waisted and I get the same performace when I substitute a regular 16x16 bit multiply. This is a problem for me as I could really do with the performance gain of a dual multiply.
Can you please confrim whether this is the expected behaviour? And if so, what is the reason this function can't take advantage of dual multiply?
My system details:
CPU: Cortex -M33
MCU: STM32U3
Compiller: GNU 13.3 (STM32Cube IDE 1.18)
Optimisation: -Ofast
Many thanks!
John