I compiled same sample c file, but got very different asmbler code.
The c file is:
void my_print(char *fmt, ...){ *((volatile int *) 0x1000) =(int) &fmt;}
void test(void){ my_print("a", 1, 2, 3, 4, 5, 6, 7, 8, 9);
}
By armclang( ARM Compiler 6.01 (build 22)), command is as "armclang -mcpu=cortex-a53 --target=aarch64-arm-none-eabi test.c" , and the assemble code of my_print is like:
0000000000008000 <my_print>:
8000: d10303ff sub sp, sp, #0xc0
8004: 3d801fe7 str q7, [sp,#112]
8008: 3d801be6 str q6, [sp,#96]
800c: 3d8017e5 str q5, [sp,#80]
8010: 3d8013e4 str q4, [sp,#64]
8014: 3d800fe3 str q3, [sp,#48]
8018: 3d800be2 str q2, [sp,#32]
801c: 3d8007e1 str q1, [sp,#16]
8020: 3d8003e0 str q0, [sp]
8024: f9005be7 str x7, [sp,#176]
8028: f90057e6 str x6, [sp,#168]
802c: f90053e5 str x5, [sp,#160]
8030: f9004fe4 str x4, [sp,#152]
8034: f9004be3 str x3, [sp,#144]
8038: f90047e2 str x2, [sp,#136]
803c: f90043e1 str x1, [sp,#128]
8040: f9005fe0 str x0, [sp,#184] // Jerry: x0, i.e., the address of fmt, located at the bottom of the stack.
8044: 9102e3e0 add x0, sp, #0xb8
8048: 2a0003e8 mov w8, w0
804c: 321403e9 orr w9, wzr, #0x1000
8050: 2a0903e0 mov w0, w9
8054: b9000008 str w8, [x0]
8058: 910303ff add sp, sp, #0xc0
805c: d65f03c0 ret
For gcc(aarch64-none-elf-gcc, Linaro GCC 4.8.3), command as "aarch64-none-elf-gcc test.c -nostdlib", then the assemble code is like: 400024: d10343ff sub sp, sp, #0xd0
400024: d10343ff sub sp, sp, #0xd0
400028: f9004fe1 str x1, [sp,#152]
40002c: f90053e2 str x2, [sp,#160]
400030: f90057e3 str x3, [sp,#168]
400034: f9005be4 str x4, [sp,#176]
400038: f9005fe5 str x5, [sp,#184]
40003c: f90063e6 str x6, [sp,#192]
400040: f90067e7 str x7, [sp,#200]
400044: 3d8007e0 str q0, [sp,#16]
400048: 3d800be1 str q1, [sp,#32]
40004c: 3d800fe2 str q2, [sp,#48]
400050: 3d8013e3 str q3, [sp,#64]
400054: 3d8017e4 str q4, [sp,#80]
400058: 3d801be5 str q5, [sp,#96]
40005c: 3d801fe6 str q6, [sp,#112]
400060: 3d8023e7 str q7, [sp,#128]
400064: f90003e0 str x0, [sp] // Jerry : x0, i.e., the address of fmt, located at the top of the stack.
400068: d2820000 mov x0, #0x1000 // #4096
40006c: 910003e1 mov x1, sp
400070: b9000001 str w1, [x0]
400074: 910343ff add sp, sp, #0xd0
400078: d65f03c0 ret
The armclang is behavior is as expected? And gcc voilated Procedure Call Standard?
B.R
Jerry
Thanks for your comment, Oliver.
It is intended to write the address of fmt(sp+0xb8 for armclang, and sp for gcc) to 0x1000, this will tigger my RTL test bench task to 'print' to logfile.
I used ARMCompiler 5.x for Cortex-A9/A7, armcc put r0(address of fmt) at the top of stack, and my 'print' works well. Now I using ARMComipler6, while armclang put it at the bottom, thus my 'print' doesn't work anymore. I just wondering why armclong not follow the old style.
B.RJerry
The behavior was never guaranteed even to a weak extent. Youcould use 0x1008 for the pointer to the arguments as in the following code. This is still not fully standards conformant but it should work.
#include <stdarg.h>
void my_print(const char *fmt, ...){
va_list args;
va_start(args, fmt);
*((volatile va_list *) 0x1008) = args;
*((volatile const char *) 0x1000) = fmt;
va_end(args);
Note that pointers no longer fit into integers. I like to use const whenever possible too for things like fmt.
I my earlier reply I made the assumption that you only needed to store the location of fmt, but I see now that you actually need to access all of the variadic arguments to my_print from your test bench. Daith's solution of using the va_list type should work, though since you have switched from Cortex-A9/A7 to Cortex-A53 (in AArch64 mode) you will need to update your test bench as AArch64 mode uses a different ABI, with a different definition of va_list.
In the 32-bit ABI, va_list contains a single pointer (32 bits) to the next argument, and all arguments are stored contiguously. In the 64-bit ABI, arguments passed in registers are stored in separate blocks, and va_list is defined as follows:
typedef struct __va_list { void *__stack; // next stack param void *__gr_top; // end of GP arg reg save area void *__vr_top; // end of FP/SIMD arg reg save area int __gr_offs; // offset from __gr_top to next GP register arg int __vr_offs; // offset from __vr_top to next FP/SIMD register arg } va_list;
typedef struct __va_list {
void *__stack; // next stack param
void *__gr_top; // end of GP arg reg save area
void *__vr_top; // end of FP/SIMD arg reg save area
int __gr_offs; // offset from __gr_top to next GP register arg
int __vr_offs; // offset from __vr_top to next FP/SIMD register arg
} va_list;
You will need to update your test bench to use this new structure. There are more details of how va_list works for AArch64 in appendix B of the procedure call standard: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
Regards,
Oliver
Yes you're right thanks.That slipped my mind. I wonder what kind of an overhead all that stuff is, for instance va_list stores a lot of floating point registers and one might never access them. If one had a standards option to say one had to always use function prototypes the business could be done much more efficiently - the varadic parameters could be passed on the stack every time and accessing them would just get the next item off the stack. I think this is one of the nastier legacies of K&R C.