Hi,

Before I write some Cortex A8 assembly code, I review some disassembly code of a small C program. In the following snippet, I don't understand the necessity of 'S' in ADDS. In fact, I don't see the usefulness of the whole line of

ADDS R0, R0, #0

Is this compiler not very efficient?

I would like to have your opinion on the disassembly code.

typedef unsigned long long Uint64;

typedef int Int32;

typedef unsigned int Uint32;

40301660: F1AD0D10 SUB.W R13, R13, #16

40301664: 9000 STR R0, [SP]

153 x = 0x76CF41F2 - ( d << 1 ); /* initialized value x(0) */

40301666: 9900 LDR R1, [SP]

40301668: 486F LDR R0, $C$CON3

4030166a: EBA00041 SUB.W R0, R0, R1, LSL #1

4030166e: 9001 STR R0, [SP, #4]

156 tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 );

40301670: 9801 LDR R0, [SP, #4]

40301672: 9900 LDR R1, [SP]

40301674: FBA11000 UMULL.W R1, R0, R1, R0

40301678: 1C00 ADDS R0, R0, #0

4030167a: 9002 STR R0, [SP, #8]

Now, I have a following question from the beginning. In the above 32-bit * 32-bit, the result is in R0:R1. When I need the high 32-bit (R0), a better way is to round the MSB of R1 to the LSB of R0. I do not want a strict symmetric rounding here. That is, we consider only a simple R1+0x8000,0000. The above addition carry bit is added to R0. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.

I would like to know whether you have a good trick to get the result very efficiently.

Thanks,