This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Inline Assembler 64 bit addition?

I was at first very happy to see that with ARMCC 5 now inline assembler works with the thumb mode of Cortex M4.

Just I fail to program an efficient inline assembler routine for 64 bit addition (later I want to do overflow checking, therefore I need inline assembly).

If I use the following function in C++:

__forceinline int satAdd( int a, int b){
  int c;
  __asm{
    ADDS c, a, b
  }
  return c;
}


This generates very nice and compact code (just one assembly line, as expected).

But I have no glue how to do this with 64 bit.

In GNU CC inline assembler, this would be very easy:

  ADDS %2, %1, %0
  ADCS %d2, %d1, %d0

I am frightened there is no possibility to access the upper register of a "64 bit register double" in Keil C++? (like "%d2"?)

(Even accessing the lower register of the "register doubles" seems to be impossible, as the Keil inline assembler seems to do type checking for the variables in the assembly part - so not possible to just use the "64 bit variable" in a ADDS command).

Further the inline assembler crashes, if I try to check the overflow flag with BVS / BVC (BEQ/ BNE/ BCS/ BCC /BMI / BPL all work ...).

SOS - I hope somebody can help here?

Parents Reply Children
  • Inline indeed would be very helpful, because this 64-bit addition consists only of 2 assembly commands (If I add the BVS checking, then 3), so it is just too short for a function branching.

    With a good inline assembler support, this can be done very fast.

    If I do it in an assembly function (I just try to use the "embedded assembly" feature - thus declaring an assembly function in my c++ module), then it principally now works with the following code:

    __asm long long addSat( long long a, long long b) {
            ADDS R0, R0, R2
            ADCS R1, R1, R3
            BVC AllOk
            BMI Oflw
                                                    // underflow into pos. number range: limit to 0x8000...
            MOV R0, #0
            MOV R1, #1
            LSLS R1, R1, #31
            B AllOk
    Oflw
                                                    // overflow into neg. number range: limit to 0x7FFFF...
            MOV R0, #0
            SUBS R0, #1
            LSRS R1, R0, #1
    AllOk
            BX LR
    }
    

    Just, if the addition does not overflow, this is a terrible spoiling of processor time. The processor will at least need two branches (branch into function and return of function). If we count only this, then we have 2-4 cycles overhead for this 3-cycle addition - so an overhead of about 100%.

    If we take into account, that the compiler also needs some further work to "force" the variables into r0-r1 and r2-r3, then the overhead will be much larger. Also, if we take into account, that branching usually spoils quite a bit of additional time for a modern processor with prefetch queue.

    This is really annoying, if you want to do saturation-safe addition at several times in a time critical loop. (Inline assembly usually would handle this MUCH more smart).