I am trying to generate a piece of code on an M4 with an exact known runtime, independent of the input.
Currently my bottleneck is that the duration of a division (udiv) is dependent on the input and therefore variable in execution time. Is there a way to ensure that my division lasts a same amount of instructions for each input?
Note: I am trying to write this with as minimal overhead as possible due to rather extreme execution time constraints.