This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

DSP instruction for xx + yy. Does it exist?

Matic over 9 years ago

Hello all!

I am new to ARM community and this is my first question here. I work on embedded systems where we use Cortex-M4 based MCUs (concretely STM32F3 series). I would like to ask, if there is a DSP instruction which would calculate x*x + y*y.

x and y represent sine and cosine values (signed integers, 16-bit variables are sufficient). I would like to calculate a square of amplitude (x*x + y*y).

Thanks in advance.

Parents

0 Matic over 9 years ago

Hi.
I would have another question regarding these DSP instructions.
What is their real advantage? I mean, if I calculate x*x + y*y using SMUAD instruction, I first had to format the 32-bit register with (x << 16) | y. Now, I have three instructions (<<, | and SMUAD). If I do a simple calculation with two multiplication and one addition (x*x + y*y), I also have three instructions. I know that I will not gain a huge amount of time with these few instructions, but I am curious when they become preferred over normal calculation. In my case there is no advantage of using it at all.
Thanks
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Matic over 9 years ago

Hi.
I would have another question regarding these DSP instructions.
What is their real advantage? I mean, if I calculate x*x + y*y using SMUAD instruction, I first had to format the 32-bit register with (x << 16) | y. Now, I have three instructions (<<, | and SMUAD). If I do a simple calculation with two multiplication and one addition (x*x + y*y), I also have three instructions. I know that I will not gain a huge amount of time with these few instructions, but I am curious when they become preferred over normal calculation. In my case there is no advantage of using it at all.
Thanks
Cancel
Vote up 0 Vote down

Cancel

Children

0 G. Goodwin L. Pitos over 9 years ago in reply to Matic

Hi matic,
You can use PKHBT and PKHTB to format the registers, see § 3.8. Packing and unpacking instructions of Cortex-M4 Devices Generic User Guide Revision r0p1. Now, tabulate all the instructions in the two methods (PKHBT/PKHTB, SMUAD versus multiply, add), calculate and compare the total number of cycles incurred to accomplish the mathematical expression. I hope you can post the result; I'm also curious but I have to log out already.
Regards,
Goodwin
Cancel
Vote up 0 Vote down

Cancel
0 Prasad over 9 years ago in reply to Matic

Hello Matic,
These instructions are kind of SIMD extensions possible with existing ARM register files.
Consider the scenario where you have a lot of x and y co-ordinates in memory where it is already in the packed form. In that case you just need to load them as 32 bit values and perform these computations.
Even in scenarios where x and y values are calculated and then performed (x*x + y*y), other SIMD instructions can be used to get the x and y results in packed format.
Or if you localize the problem, then yes we do have the packing overhead and it can be huge such that there is no advantage in using those special instructions.
Regards, Prasad
Cancel
Vote up 0 Vote down

Cancel
0 G. Goodwin L. Pitos over 9 years ago in reply to Matic

You may want to try if the instructions below will work:
PKHBT Rpck, Ry, Rx LSL #16   ; Writes bottom halfword of Ry to bottom halfword of
                              ; Rpck, writes top halfword of Rx, shifted left by 16 bit, to top
                              ; halfword of Rpck
SMUAD Rsumsqrs, Rpck, Rpck   ; Multiplies bottom halfword of Rpck with the bottom
                              ; halfword of Rpck (y squared), adds multiplication of top halfword
                              ; of Rpck with top halfword of Rpck (+ x squared), writes to Rsumsqrs
1. Verify if the PKHBT instruction works as intended.
If registers Rx and Ry contain the signed 16-bit x and signed 16-bit y, respectively, in their low-order halfwords, PKHBT packs them into register Rpck. Here, x occupies the high-order halfword and y occupies the low-order halfword in Rpck. Rx and Ry can be interchanged swapping the high-order and low-order halfwords in Rpck.
2. Verify if the format used for SMUAD is allowed.
Using Rpck for both the first and second operands in SMUAD, the sum of the square of the high-order halfword and the square of the low-order halfword of Rpck will be stored in Rsumsqrs.
If this will work, you get a total of 2 instructions (also 2 cycles) to compute x² + y² (when x and y are already in registers Rx and Ry prior to PKHBT instruction).
Cancel
Vote up 0 Vote down

Cancel
0 G. Goodwin L. Pitos over 9 years ago in reply to Matic

There might be further improvement that you can do to the code but additional details about your application are needed. For example, if it is possible to interleave x and y in memory (x and y to occupy 1 word) the PKHBT instruction after a load is not needed anymore. I believe though that they are results of calculations (real and imaginary components of complex quantity) and can be readily found in registers prior to the calculation of square of amplitude.
Cancel
Vote up 0 Vote down

Cancel

DSP instruction for x*x + y*y. Does it exist?

DSP instruction for xx + yy. Does it exist?