This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to optimize this assembly code segment..

Hi, I am trying to rewrite the arm assembly code generated by gcc in release mode (for optimization purpose). The below code is in a loop and so it will get executed many times. Can some one please let me know how can I optimize it for cycles. Are there any bottle necks in this code? Thanks in advance..

Assembly code:

ldr r4, [r1, #8]

 ldr r6, [r1]

ldr r7, [r7, #-12]

and r11, r4, #7

ldr r6, [r6, r4, lsr #3]

rev r6, r6

lsl r6, r6, r11

lsr r6, r6, #23

lsl r6, r6, #2

add r11, r7, r6

ldrsh r6, [r7, r6]

ldrsh r7, [r11, #2]

add r4, r7, r4

ldr r7, [r1, #16]

cmp r4, r7

strls r4, [r1, #8]

strhi r7, [r1, #8]

Parents
  • It is better to start from the C code for optimizing - there's no good indication here of what you're hoping the code will do. Changing the data structure or algorithm a bit is often the best path only using assembly if really really desperate and the C can't be improved. Also you haven't included the whole loop - there's no branch. At a quick glace though I'd wonder what 'ldrsh r6, [r7, r6]'  is doing as r6 doesn't seem to be then used. Also is there a big difference in how often the two predicated stores at the end are done?

Reply
  • It is better to start from the C code for optimizing - there's no good indication here of what you're hoping the code will do. Changing the data structure or algorithm a bit is often the best path only using assembly if really really desperate and the C can't be improved. Also you haven't included the whole loop - there's no branch. At a quick glace though I'd wonder what 'ldrsh r6, [r7, r6]'  is doing as r6 doesn't seem to be then used. Also is there a big difference in how often the two predicated stores at the end are done?

Children