This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Inline Assembler 64 bit addition?

I was at first very happy to see that with ARMCC 5 now inline assembler works with the thumb mode of Cortex M4.

Just I fail to program an efficient inline assembler routine for 64 bit addition (later I want to do overflow checking, therefore I need inline assembly).

If I use the following function in C++:

__forceinline int satAdd( int a, int b){
  int c;
  __asm{
    ADDS c, a, b
  }
  return c;
}

This generates very nice and compact code (just one assembly line, as expected).

But I have no glue how to do this with 64 bit.

In GNU CC inline assembler, this would be very easy:

  ADDS %2, %1, %0
  ADCS %d2, %d1, %d0

I am frightened there is no possibility to access the upper register of a "64 bit register double" in Keil C++? (like "%d2"?)

(Even accessing the lower register of the "register doubles" seems to be impossible, as the Keil inline assembler seems to do type checking for the variables in the assembly part - so not possible to just use the "64 bit variable" in a ADDS command).

Further the inline assembler crashes, if I try to check the overflow flag with BVS / BVC (BEQ/ BNE/ BCS/ BCC /BMI / BPL all work ...).

SOS - I hope somebody can help here?

Parents

0 jp m over 13 years ago in reply to ²erik malund

when the compiler inlines it can skip the prologue/epilogue code.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 jp m over 13 years ago in reply to ²erik malund

when the compiler inlines it can skip the prologue/epilogue code.
Cancel
Vote up 0 Vote down

Cancel

Children

0 nice day over 13 years ago in reply to jp m
Inline indeed would be very helpful, because this 64-bit addition consists only of 2 assembly commands (If I add the BVS checking, then 3), so it is just too short for a function branching.

With a good inline assembler support, this can be done very fast.

If I do it in an assembly function (I just try to use the "embedded assembly" feature - thus declaring an assembly function in my c++ module), then it principally now works with the following code:

__asm long long addSat( long long a, long long b) { ADDS R0, R0, R2 ADCS R1, R1, R3 BVC AllOk BMI Oflw // underflow into pos. number range: limit to 0x8000... MOV R0, #0 MOV R1, #1 LSLS R1, R1, #31 B AllOk Oflw // overflow into neg. number range: limit to 0x7FFFF... MOV R0, #0 SUBS R0, #1 LSRS R1, R0, #1 AllOk BX LR }

Just, if the addition does not overflow, this is a terrible spoiling of processor time. The processor will at least need two branches (branch into function and return of function). If we count only this, then we have 2-4 cycles overhead for this 3-cycle addition - so an overhead of about 100%.

If we take into account, that the compiler also needs some further work to "force" the variables into r0-r1 and r2-r3, then the overhead will be much larger. Also, if we take into account, that branching usually spoils quite a bit of additional time for a modern processor with prefetch queue.

This is really annoying, if you want to do saturation-safe addition at several times in a time critical loop. (Inline assembly usually would handle this MUCH more smart).
Cancel
Vote up 0 Vote down

Cancel