b .+2 B is Branch instruction but I don not understant what is .+2 and how many cycle use this instruciton. I was looking for a book, but we can not tell anything nicely, you can recommend something
Still sounds like some half-assed tuned loop for the output of GPIO "at some specific rate" where the model is to drive the processor into saturation. Something better addressed with a periodic timer and DMA, and a buffer large enough to decimate interrupt loading, so you can fill the buffer with new data while keeping it from being unduly massive.
Is this something that can be solved with a CPLD rather than grinding a 72 MHz Cortex-M3 ?
The compiler is not the place to write fine tuned assembler.
Agreed. Delays like this could (occasionally) be justified in the past on something like an 8051 with highly predictable instruction timing, but a Cortex, nah.