I would like to use the the number of oscillator periods for each instruction in delay calculations for instructions like NOP etc.
LOL. You are perfectly correct! For some very short delays, for instance where a device you're driving just needs to get its wits together to reply, it can be a saving to do a quick delay then check if the expected result comes back. If it does then one is on track and knows exactly what to do next. Otherwise the easiest thing is to just return and handle it as a general interrupt whenever the device responds. This can give a useful increase in speed sometimes. However it should only be done when the speed is very important.
If substituting an M3 for an 8051 I'd have though the best thing to do would be to use interrupts properly to start with and the speed would more than adequate. Longer times can be got using the system clock.