I am trying to generate a piece of code on an M4 with an exact known runtime, independent of the input.
Currently my bottleneck is that the duration of a division (udiv) is dependent on the input and therefore variable in execution time. Is there a way to ensure that my division lasts a same amount of instructions for each input?
Note: I am trying to write this with as minimal overhead as possible due to rather extreme execution time constraints.
If you are for a constant number of cycles and the total amount does not matter (much), you might do two divisions.
One with the original numerator and the second with the inverted one. Maybe the total runtime of these two is close to constant.
Hi Bastian, thanks a lot for your help.
Currently I have used a cast to float such that it performs a float division which always lasts 14 cycles. Timingwise it's all stable now.
Will try your proposition out of curiosity though! :)
Have a nice day, Rens