Hi, I would like to run the following function for saturation controlled 64-bit-addition to inline assembly:
__asm LLONG qadd_ll( LLONG a, LLONG b) { ADDS R0, R0, R2 ADCS R1, R1, R3 BVS Oflw_qadd_ll BX LR Oflw_qadd_ll BPL Uflw_qadd_ll // overflow into neg.: limit to 0x7FFFF... MOV R0, #0 SUBS R0, #1 LSRS R1, R0, #1 BX LR Uflw_qadd_ll // underflow into pos.: limit to 0x8000... MOV R0, #0 MOV R1, #1 LSLS R1, R1, #31 BX LR }
This would speed up my software in critical parts by a factor 2, because I then avoid the function call stuff.
Has anyone a hint how this can be possibly done with the Keil c/cpp compiler 5.05 (MDK 5.12)?
The problem is the access to the low/high register of a llong c variable.
I tried the following:
typedef union{ struct{ int iLo; int iHi; }; LLONG ll; }U_LLONG; LLONG func_c( LLONG a, LLONG b){ U_LLONG ua; U_LLONG ub; U_LLONG uc; ua.ll= a; ub.ll= b; uc.ll= ua.ll+ub.ll; uc.ll= uc.ll+ub.ll; return uc.ll; } LLONG func_asm( LLONG a, LLONG b){ U_LLONG ua; U_LLONG ub; U_LLONG uc; ua.ll= a; ub.ll= b; asm( "ADDS ub.iLo, ua.iLo, ub.iLo\n" "ADCS ub.iHi, ua.iHi, ub.iHi\n"); asm( "ADDS uc.iLo, uc.iLo, ub.iLo\n" "ADCS uc.iHi, uc.iHi, ub.iHi\n"); return uc.ll; }
If I invoke func_c, this will generate nice register code.
If I invoke func_asm, this will generate very cumbersome non-register code with lots of unnecessary stack allocation.
Has anybody perhaps a hint how to convince the compiler to use register code also for the inline assembly 64 addition?