I have asked just everyone if their is a fast way to find the top 32-bits of a 32-bit x 32-bit multiply? There were multiply instructions that returned all 64 bits but they take 17 or 18 cycles doing something not used so:MULSHIFT32: lsrs r3, r0, #16 //Factor0 hi [16:31] uxth r0, r0 //Factor0 lo [0:15] uxth r2, r1 //Factor1 lo [0:15] lsrs r1, r1, #16 //Factor1 hi [16:31]
muls r0, r1 //Factor0 lo * Factor1 hi muls r2, r3 //Factor1 lo * Factor0 hi muls r1, r3 //Factor1 hi * Factor0 hi
adds r0, r2 //(Factor0 lo * Factor1 hi) + (Factor1 lo * Factor0 hi)
movs r2, #0 // adcs r2, r2 //C --> bit 16 (r2 contains $00000000 or $00010000) lsls r2, r2, #16 //
lsrs r3, r0, #16 //Extract partial result [bits 16-31]
adds r2, r3 //Partial [bits 16-47] adds r1, r2 //Results [bit 32-63]Now the problem I have is that I cannot find my copy of the red book (Joseph Yiu's book on programming the M0 & M0+).The fact that it currently takes 4 instructions to move C into bit 16 of a register looks like it MAY be possible to speed up so that rather than two ADDS at the end, a single ADDS Rd, Rn, Rm since all registers are low.So, now we are getting somewhere. I should add that my good friend Sarah Avory wrote the logic in C and simply tested it with every possible value to check it was correct. She was also able to save a cycle which seems tiny by todays standards, but in certain applications, the MULSHIFT32 is used millions of times a second.
https://www.keil.com/support/man/docs/armasm/armasm_dom1361289891242.htmAt the moment I am looking at developing MP3 on Raspberry Pi Pico so I can work out how low the clock can go and still complete each frame of audio.
You should look into the armv6m reference manual. It lists not the immediate version.
Armv7-M RM:
A7.7.116 ROR (immediate)
Encoding T1 ARMv7-MROR{S}<c> <Rd>,<Rm>,#<imm5>
A7.7.117 ROR (register)
Encoding T1 All versions of the Thumb instruction set.RORS <Rdn>,<Rm> Outside IT block.ROR<c> <Rdn>,<Rm> Inside IT block.
Armv6-M RM:A6.7.54 ROR (register)
Encoding T1 All versions of the Thumb instruction set.RORS <Rdn>,<Rm>