You have an interesting idea of fun. You might have to split the operation over two cycles. You could check the cycle timings in the ARM TRMs, this might give indication of whether they use this approach.Just an observation, but you might find that the architecture is protected by various patents.