My Cortex M0+ calculates 32-bit x 32-bit --> 64-bit result in 17 cycles. For the most part I only need bits 32-63. Does anyone know a method for calculation of top 32-bits? 1 cycle from 17 (for example) doesn't seem like a big deal but it's the deviding line between the possible and the impossible.
I would point out that the last instruction that calculates 32-63 so no savings there.