Hi,
Before I write some Cortex A8 assembly code, I review some disassembly code of a small C program. In the following snippet, I don't understand the necessity of 'S' in ADDS. In fact, I don't see the usefulness of the whole line of
ADDS R0, R0, #0
Is this compiler not very efficient?
I would like to have your opinion on the disassembly code.
typedef unsigned long long Uint64; typedef int Int32; typedef unsigned int Uint32; 40301660: F1AD0D10 SUB.W R13, R13, #16 40301664: 9000 STR R0, [SP] 153 x = 0x76CF41F2 - ( d << 1 ); /* initialized value x(0) */ 40301666: 9900 LDR R1, [SP] 40301668: 486F LDR R0, $C$CON3 4030166a: EBA00041 SUB.W R0, R0, R1, LSL #1 4030166e: 9001 STR R0, [SP, #4] 156 tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 ); 40301670: 9801 LDR R0, [SP, #4] 40301672: 9900 LDR R1, [SP] 40301674: FBA11000 UMULL.W R1, R0, R1, R0 40301678: 1C00 ADDS R0, R0, #0 4030167a: 9002 STR R0, [SP, #8]
typedef unsigned long long Uint64;
typedef int Int32;
typedef unsigned int Uint32;
40301660: F1AD0D10 SUB.W R13, R13, #16
40301664: 9000 STR R0, [SP]
153 x = 0x76CF41F2 - ( d << 1 ); /* initialized value x(0) */
40301666: 9900 LDR R1, [SP]
40301668: 486F LDR R0, $C$CON3
4030166a: EBA00041 SUB.W R0, R0, R1, LSL #1
4030166e: 9001 STR R0, [SP, #4]
156 tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 );
40301670: 9801 LDR R0, [SP, #4]
40301672: 9900 LDR R1, [SP]
40301674: FBA11000 UMULL.W R1, R0, R1, R0
40301678: 1C00 ADDS R0, R0, #0
4030167a: 9002 STR R0, [SP, #8]
Now, I have a following question from the beginning. In the above 32-bit * 32-bit, the result is in R0:R1. When I need the high 32-bit (R0), a better way is to round the MSB of R1 to the LSB of R0. I do not want a strict symmetric rounding here. That is, we consider only a simple R1+0x8000,0000. The above addition carry bit is added to R0. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.
I would like to know whether you have a good trick to get the result very efficiently.
Thanks,
By the way, what is the source code according to 'R3+0x8000,0000'?
From only your sample code mentioned, I cannot understand your intention.
Excuse me. I did not make it clear again in previous post. The original C code does not have the rounding yet. I think that it should have one for smaller quantization. I tried to add rounding to the (dis-)assembly code only. My last question is whether there is quick/short ARM assembly instruction to make the rounding. Thanks,
Hello again,
I compiled the similar code by "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]".
It genetated the below code.
sub sp, sp, #16 ldr r3, .L3 ldr r2, [sp, #12] sub r3, r3, r2, lsl #1 str r3, [sp, #8] ldr r2, [sp, #8] ldr r3, [sp, #12] umull r2, r3, r3, r2 str r3, [sp, #4] add sp, sp, #16 bx lr .L4: .align 2 .L3: .word 1993294322
By this code, the MUL results are R3:R2.
Best regards,
Yasuhiko Koumoto.