This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Is the 'S' necessary in the asm code?

Robert over 10 years ago

Hi,

Before I write some Cortex A8 assembly code, I review some disassembly code of a small C program. In the following snippet, I don't understand the necessity of 'S' in ADDS. In fact, I don't see the usefulness of the whole line of

ADDS R0, R0, #0

Is this compiler not very efficient?

I would like to have your opinion on the disassembly code.

typedef unsigned long long Uint64;

typedef int     Int32;

typedef unsigned int           Uint32;

40301660:   F1AD0D10 SUB.W           R13, R13, #16

40301664:   9000     STR             R0, [SP]

153           x = 0x76CF41F2 - ( d << 1 );                    /* initialized value x(0) */

40301666:   9900     LDR             R1, [SP]

40301668:   486F     LDR             R0, $C$CON3

4030166a:   EBA00041 SUB.W           R0, R0, R1, LSL #1

4030166e:   9001     STR             R0, [SP, #4]

156           tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 );

40301670:   9801     LDR             R0, [SP, #4]

40301672:   9900     LDR             R1, [SP]

40301674:   FBA11000 UMULL.W         R1, R0, R1, R0

40301678:   1C00     ADDS            R0, R0, #0

4030167a:   9002     STR             R0, [SP, #8]

Now, I have a following question from the beginning. In the above 32-bit * 32-bit, the result is in R0:R1. When I need the high 32-bit (R0), a better way is to round the MSB of R1 to the LSB of R0. I do not want a strict symmetric rounding here. That is, we consider only a simple R1+0x8000,0000. The above addition carry bit is added to R0. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.

I would like to know whether you have a good trick to get the result very efficiently.

Thanks,

Top replies

Yasuhiko Koumoto over 10 years ago in reply to Robert +1 verified

Hello again, I compiled the similar code by "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]". It genetated the below code. sub...

0 Yasuhiko Koumoto over 10 years ago

Hello,
the post-fix 'S' indicates the condition flags is modified.
Because you seem to use Thumb instructions, ADD instruction always modify the condition flags in Thumb.
So normal ADD is sometimes described as ADDS.
Some compilers (or assemblers will interpret ADD as ADDS, and 'S' would not be needed.
However, if ADD exits in the IT block, the 'S' cannot be added because no flag modification happens in the IT block.
Regarding long multiplication (i.e. 32bit x 32bit -> 64bit), the arbitrary two registers can be spefied.
I am afraid that you would misunderstand a normal MUL generates 64bit result.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Robert over 10 years ago in reply to Yasuhiko Koumoto

Thanks for your reply. The first part on 'S' is very helpful to me.
I just notice that it was confusing because there are too many R0, R1 above. And I may not explain my question clearly. Thus, I change to the following registers to avoid ambiguity.
156          tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 );
40301670: 9801    LDR            R0, [SP, #4]
40301672: 9900    LDR            R1, [SP]
40301674: FBA11000 UMULL.W        R3, R2, R1, R0
40301678: 1C00    ADDS         R2, R2, #0
4030167a: 9002    STR            R2, [SP, #8]
First, I don't think the blue line code is necessary. The C compiler is not very good at this work. Do you think so?
Second, in the above 32-bit * 32-bit, the result is in R2:R3. When I need the high 32-bit (R2), a better way is to round the MSB or R3 to LSB of R2. I do not want a strict symmetric rounding here. That is, we consider only a simple R3+0x8000,0000. The above addition carry bit is added to R2. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.
Thanks, yasuhikokoumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Robert

Hello,
Regarding ADDS, I think it is for set Z-flag.
Do you check whether the tmp variable is zero or not in your program?
Regarding UMULL, I think as your code does not use the lower word, the compiler assigned the significant word into R2.
I guess if there is not the right shift by 32bit, the UMULL results are that R2 is lower and R3 is higher.
By the way, what is the source code according to 'R3+0x8000,0000'?
From only your sample code mentioned, I cannot understand your intention.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Robert over 10 years ago in reply to Yasuhiko Koumoto

By the way, what is the source code according to 'R3+0x8000,0000'?
From only your sample code mentioned, I cannot understand your intention.
Excuse me. I did not make it clear again in previous post. The original C code does not have the rounding yet. I think that it should have one for smaller quantization. I tried to add rounding to the (dis-)assembly code only. My last question is whether there is quick/short ARM assembly instruction to make the rounding. Thanks,
Cancel
Vote up 0 Vote down

Cancel

0 Yasuhiko Koumoto over 10 years ago in reply to Robert

Hello again,

I compiled the similar code by "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]".

It genetated the below code.

        sub     sp, sp, #16
        ldr     r3, .L3
        ldr     r2, [sp, #12]
        sub     r3, r3, r2, lsl #1
        str     r3, [sp, #8]
        ldr     r2, [sp, #8]
        ldr     r3, [sp, #12]
        umull   r2, r3, r3, r2
        str     r3, [sp, #4]
        add     sp, sp, #16
        bx      lr
.L4:
        .align  2
.L3:
        .word   1993294322

By this code, the MUL results are R3:R2.

Best regards,

Yasuhiko Koumoto.