I am trying to generate a piece of code on an M4 with an exact known runtime, independent of the input.
Currently my bottleneck is that the duration of a division (udiv) is dependent on the input and therefore variable in execution time. Is there a way to ensure that my division lasts a same amount of instructions for each input?
Note: I am trying to write this with as minimal overhead as possible due to rather extreme execution time constraints.
When does "early termination" happen? (how does up-to 63 leading zeros become 12 cycles or less?)With a 16bit numerator, presumably the range in number of cycles is smaller than if there were a full 32bit numerator? Do you know how much variation you're seeing? I'm wondering if you can multiply by constants first (perhaps in conjunction with CLZ) to always result in the same number of cycles. (well, I'm pretty sure you could, but I don't know whether it would be the float solution you came up with.)
If the denominator is constant, you could consider a multiply-by-reciprocal approach.
Thanks for the update and quick reply. I'll be sure to keep an eye on this thread. Looking for the same issue. Bumped into your thread. Thanks for creating it. Looking forward for solution.