Hi, I am trying to rewrite the arm assembly code generated by gcc in release mode (for optimization purpose). The below code is in a loop and so it will get executed many times. Can some one please let me know how can I optimize it for cycles. Are there any bottle necks in this code? Thanks in advance..
Assembly code:
ldr r4, [r1, #8] ldr r6, [r1] ldr r7, [r7, #-12] and r11, r4, #7 ldr r6, [r6, r4, lsr #3] rev r6, r6 lsl r6, r6, r11 lsr r6, r6, #23 lsl r6, r6, #2 add r11, r7, r6 ldrsh r6, [r7, r6] ldrsh r7, [r11, #2] add r4, r7, r4 ldr r7, [r1, #16] cmp r4, r7 strls r4, [r1, #8] strhi r7, [r1, #8]
Yes, I mean link register r14 (sorry for the mistake). I am not using any branch with link instructions in the function, so I think it is safe to use link register (of course, I will take care of pushing at the entrance and popping at the exit of function). I agree with you that sp is not something to mess with. Also could you please let me know are there any free softwares which will do the pipeline stall analysis instead of manually checking each instruction? Thanks