# MULSHIFT32 in 14 cycles

I have asked just everyone if their is a fast way to find the top 32-bits of a 32-bit x 32-bit multiply? There were multiply instructions that returned all 64 bits but they take 17 or 18 cycles doing something not used so:

MULSHIFT32:
lsrs r3, r0, #16 //Factor0 hi [16:31]
uxth r0, r0 //Factor0 lo [0:15]
uxth r2, r1 //Factor1 lo [0:15]
lsrs r1, r1, #16 //Factor1 hi [16:31]

muls r0, r1 //Factor0 lo * Factor1 hi
muls r2, r3 //Factor1 lo * Factor0 hi
muls r1, r3 //Factor1 hi * Factor0 hi

adds r0, r2 //(Factor0 lo * Factor1 hi) + (Factor1 lo * Factor0 hi)

movs r2, #0 //
adcs r2, r2 //C --> bit 16 (r2 contains \$00000000 or \$00010000)
lsls r2, r2, #16 //

lsrs r3, r0, #16 //Extract partial result [bits 16-31]

adds r2, r3 //Partial [bits 16-47]
adds r1, r2 //Results [bit 32-63]

Now the problem I have is that I cannot find my copy of the red book (Joseph Yiu's book on programming the M0 & M0+).The fact that it currently takes 4 instructions to move C into bit 16 of a register looks like it MAY be possible to speed up so that rather than two ADDS at the end, a single ADDS Rd, Rn, Rm since all registers are low.

So, now we are getting somewhere. I should add that my good friend Sarah Avory wrote the logic in C and simply tested it with every possible value to check it was correct. She was also able to save a cycle which seems tiny by todays standards, but in certain applications, the MULSHIFT32 is used millions of times a second.

### Top replies

Parents
More questions in this forum

920 views
0 replies
Started
by

529 views
10 replies
Latest

173 views
Latest

207 views