Hi,
Before I write some Cortex A8 assembly code, I review some disassembly code of a small C program. In the following snippet, I don't understand the necessity of 'S' in ADDS. In fact, I don't see the usefulness of the whole line of
ADDS R0, R0, #0
Is this compiler not very efficient?
I would like to have your opinion on the disassembly code.
typedef unsigned long long Uint64; typedef int Int32; typedef unsigned int Uint32; 40301660: F1AD0D10 SUB.W R13, R13, #16 40301664: 9000 STR R0, [SP] 153 x = 0x76CF41F2 - ( d << 1 ); /* initialized value x(0) */ 40301666: 9900 LDR R1, [SP] 40301668: 486F LDR R0, $C$CON3 4030166a: EBA00041 SUB.W R0, R0, R1, LSL #1 4030166e: 9001 STR R0, [SP, #4] 156 tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 ); 40301670: 9801 LDR R0, [SP, #4] 40301672: 9900 LDR R1, [SP] 40301674: FBA11000 UMULL.W R1, R0, R1, R0 40301678: 1C00 ADDS R0, R0, #0 4030167a: 9002 STR R0, [SP, #8]
typedef unsigned long long Uint64;
typedef int Int32;
typedef unsigned int Uint32;
40301660: F1AD0D10 SUB.W R13, R13, #16
40301664: 9000 STR R0, [SP]
153 x = 0x76CF41F2 - ( d << 1 ); /* initialized value x(0) */
40301666: 9900 LDR R1, [SP]
40301668: 486F LDR R0, $C$CON3
4030166a: EBA00041 SUB.W R0, R0, R1, LSL #1
4030166e: 9001 STR R0, [SP, #4]
156 tmp = ( Uint32 )( ( ( Uint64 )x * ( Uint64 )d ) >> 32 );
40301670: 9801 LDR R0, [SP, #4]
40301672: 9900 LDR R1, [SP]
40301674: FBA11000 UMULL.W R1, R0, R1, R0
40301678: 1C00 ADDS R0, R0, #0
4030167a: 9002 STR R0, [SP, #8]
Now, I have a following question from the beginning. In the above 32-bit * 32-bit, the result is in R0:R1. When I need the high 32-bit (R0), a better way is to round the MSB of R1 to the LSB of R0. I do not want a strict symmetric rounding here. That is, we consider only a simple R1+0x8000,0000. The above addition carry bit is added to R0. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.
I would like to know whether you have a good trick to get the result very efficiently.
Thanks,
Thanks for your reply. The first part on 'S' is very helpful to me.
I just notice that it was confusing because there are too many R0, R1 above. And I may not explain my question clearly. Thus, I change to the following registers to avoid ambiguity.
40301674: FBA11000 UMULL.W R3, R2, R1, R0
40301678: 1C00 ADDS R2, R2, #0
4030167a: 9002 STR R2, [SP, #8]
First, I don't think the blue line code is necessary. The C compiler is not very good at this work. Do you think so?
Second, in the above 32-bit * 32-bit, the result is in R2:R3. When I need the high 32-bit (R2), a better way is to round the MSB or R3 to LSB of R2. I do not want a strict symmetric rounding here. That is, we consider only a simple R3+0x8000,0000. The above addition carry bit is added to R2. I am still new to ARM A8 instructions. I find that there are quite a few instructions needed to the rounding.
Thanks, yasuhikokoumoto.
Hello,
Regarding ADDS, I think it is for set Z-flag.
Do you check whether the tmp variable is zero or not in your program?
Regarding UMULL, I think as your code does not use the lower word, the compiler assigned the significant word into R2.
I guess if there is not the right shift by 32bit, the UMULL results are that R2 is lower and R3 is higher.
By the way, what is the source code according to 'R3+0x8000,0000'?
From only your sample code mentioned, I cannot understand your intention.
Best regards,
Yasuhiko Koumoto.
Excuse me. I did not make it clear again in previous post. The original C code does not have the rounding yet. I think that it should have one for smaller quantization. I tried to add rounding to the (dis-)assembly code only. My last question is whether there is quick/short ARM assembly instruction to make the rounding. Thanks,
Hello again,
I compiled the similar code by "GCC: (GNU Tools for ARM Embedded Processors) 4.9.3 20150529 (release) [ARM/embedded-4_9-branch revision 224288]".
It genetated the below code.
sub sp, sp, #16 ldr r3, .L3 ldr r2, [sp, #12] sub r3, r3, r2, lsl #1 str r3, [sp, #8] ldr r2, [sp, #8] ldr r3, [sp, #12] umull r2, r3, r3, r2 str r3, [sp, #4] add sp, sp, #16 bx lr .L4: .align 2 .L3: .word 1993294322
By this code, the MUL results are R3:R2.