Cortex M0+ delay routine without timers

Hello, I´m trying to implement a small assembly routine in a cortex M0+ in order to introduce a software controlled delay in microseconds. For performing this, i wrote this small while() routine:

      while (CyclesToDelay > 0)
      {
         __no_operation();
         CyclesToDelay--;
      }


So, knowing exactly what assembly instructions are executing, the clk clock frequency, and the number of clk cycles per instruction, I can calculate the CyclesToDelay value for a desired delay to introduce.

Assembly:

__no_operation();
0002 73F8 NOP
CyclesToDelay--;
0002 73FA LDR R1, [R0]
0002 73FC SUBS R1, R1, #1
0002 73FE STR R1, [R0]
while (CyclesToDelay > 0)
0002 7400 LDR R1, [R0]
0002 7402 CMP R1, #0
0002 7404 BNE 0x000273F8


Instruction clk cycles according https://developer.arm.com/documentation/ddi0432/c/programmers-model/instruction-set-summary are:

NOP 1 clk cycles

LDR 2 clk cycles

SUBS 1 clk cycles

STR  2 clk cycles

LDR 2 clk cycles

CMP 1 clk cycles

BNE 3 clk cycles

Total clk cycles in this routine = 12 clk cycles,

CORE_CLK = 48mhz



Then,

1) 1 CORE_CLK cycle = 1 / 48 us

2) 12 CORE_CLK cycles = 12/48 us ( The number of microseconds one loop of this routine should delay)


Finally:

3)  CyclesToDelay *  (12/48) = delay_we_want_to_introduce(us) 

     or

     CyclesToDelay = delay_we_want_to_inject(us) * 48 / 12


However, measuring the delays obtained with this method does not seem to be very accurate. I dont know if this is going to be deterministic or if this is totally possible in this Cortex M0+. Feedback would be appreciated. Many thanks.