I am trying to generate a piece of code on an M4 with an exact known runtime, independent of the input.
Currently my bottleneck is that the duration of a division (udiv) is dependent on the input and therefore variable in execution time. Is there a way to ensure that my division lasts a same amount of instructions for each input?
Note: I am trying to write this with as minimal overhead as possible due to rather extreme execution time constraints.
Hi Bastian, Thanks for your good reply!
Two good suggestions!
For the first option, the problem is that the numerator for the division is a 16-bit pseudorandom number, so the variation on the leading zeroes is quite large and unpredictable, adjusting these would leave me with a significant overhead imho.
The NOPs method would have been my first choice, but from other use cases on the M4 so far, it has proven quite hard to match timing using NOPs on an M4 due to the NOP instruction being taken as a hint and not necessarily delaying the exact amount of time matching the amount of NOPs. (instruction alignment and pipelining of instructions seem to have a big impact on this, but I have not found a satisfying explanation for this yet.)
I was mainly wondering whether there is a way of disabling/preventing the early termination mechanism?
If you are for a constant number of cycles and the total amount does not matter (much), you might do two divisions.
One with the original numerator and the second with the inverted one. Maybe the total runtime of these two is close to constant.
Hi Bastian, thanks a lot for your help.
Currently I have used a cast to float such that it performs a float division which always lasts 14 cycles. Timingwise it's all stable now.
Will try your proposition out of curiosity though! :)
Have a nice day, Rens