I am trying to generate a piece of code on an M4 with an exact known runtime, independent of the input.
Currently my bottleneck is that the duration of a division (udiv) is dependent on the input and therefore variable in execution time. Is there a way to ensure that my division lasts a same amount of instructions for each input?
Note: I am trying to write this with as minimal overhead as possible due to rather extreme execution time constraints.
Hi Bastian, thanks a lot for your help.
Currently I have used a cast to float such that it performs a float division which always lasts 14 cycles. Timingwise it's all stable now.
Will try your proposition out of curiosity though! :)
Have a nice day, Rens