I compiled same sample c file, but got very different asmbler code.
The c file is:
void my_print(char *fmt, ...){ *((volatile int *) 0x1000) =(int) &fmt;}
void test(void){ my_print("a", 1, 2, 3, 4, 5, 6, 7, 8, 9);
}
By armclang( ARM Compiler 6.01 (build 22)), command is as "armclang -mcpu=cortex-a53 --target=aarch64-arm-none-eabi test.c" , and the assemble code of my_print is like:
0000000000008000 <my_print>:
8000: d10303ff sub sp, sp, #0xc0
8004: 3d801fe7 str q7, [sp,#112]
8008: 3d801be6 str q6, [sp,#96]
800c: 3d8017e5 str q5, [sp,#80]
8010: 3d8013e4 str q4, [sp,#64]
8014: 3d800fe3 str q3, [sp,#48]
8018: 3d800be2 str q2, [sp,#32]
801c: 3d8007e1 str q1, [sp,#16]
8020: 3d8003e0 str q0, [sp]
8024: f9005be7 str x7, [sp,#176]
8028: f90057e6 str x6, [sp,#168]
802c: f90053e5 str x5, [sp,#160]
8030: f9004fe4 str x4, [sp,#152]
8034: f9004be3 str x3, [sp,#144]
8038: f90047e2 str x2, [sp,#136]
803c: f90043e1 str x1, [sp,#128]
8040: f9005fe0 str x0, [sp,#184] // Jerry: x0, i.e., the address of fmt, located at the bottom of the stack.
8044: 9102e3e0 add x0, sp, #0xb8
8048: 2a0003e8 mov w8, w0
804c: 321403e9 orr w9, wzr, #0x1000
8050: 2a0903e0 mov w0, w9
8054: b9000008 str w8, [x0]
8058: 910303ff add sp, sp, #0xc0
805c: d65f03c0 ret
For gcc(aarch64-none-elf-gcc, Linaro GCC 4.8.3), command as "aarch64-none-elf-gcc test.c -nostdlib", then the assemble code is like: 400024: d10343ff sub sp, sp, #0xd0
400024: d10343ff sub sp, sp, #0xd0
400028: f9004fe1 str x1, [sp,#152]
40002c: f90053e2 str x2, [sp,#160]
400030: f90057e3 str x3, [sp,#168]
400034: f9005be4 str x4, [sp,#176]
400038: f9005fe5 str x5, [sp,#184]
40003c: f90063e6 str x6, [sp,#192]
400040: f90067e7 str x7, [sp,#200]
400044: 3d8007e0 str q0, [sp,#16]
400048: 3d800be1 str q1, [sp,#32]
40004c: 3d800fe2 str q2, [sp,#48]
400050: 3d8013e3 str q3, [sp,#64]
400054: 3d8017e4 str q4, [sp,#80]
400058: 3d801be5 str q5, [sp,#96]
40005c: 3d801fe6 str q6, [sp,#112]
400060: 3d8023e7 str q7, [sp,#128]
400064: f90003e0 str x0, [sp] // Jerry : x0, i.e., the address of fmt, located at the top of the stack.
400068: d2820000 mov x0, #0x1000 // #4096
40006c: 910003e1 mov x1, sp
400070: b9000001 str w1, [x0]
400074: 910343ff add sp, sp, #0xd0
400078: d65f03c0 ret
The armclang is behavior is as expected? And gcc voilated Procedure Call Standard?
B.R
Jerry
I my earlier reply I made the assumption that you only needed to store the location of fmt, but I see now that you actually need to access all of the variadic arguments to my_print from your test bench. Daith's solution of using the va_list type should work, though since you have switched from Cortex-A9/A7 to Cortex-A53 (in AArch64 mode) you will need to update your test bench as AArch64 mode uses a different ABI, with a different definition of va_list.
In the 32-bit ABI, va_list contains a single pointer (32 bits) to the next argument, and all arguments are stored contiguously. In the 64-bit ABI, arguments passed in registers are stored in separate blocks, and va_list is defined as follows:
typedef struct __va_list { void *__stack; // next stack param void *__gr_top; // end of GP arg reg save area void *__vr_top; // end of FP/SIMD arg reg save area int __gr_offs; // offset from __gr_top to next GP register arg int __vr_offs; // offset from __vr_top to next FP/SIMD register arg } va_list;
typedef struct __va_list {
void *__stack; // next stack param
void *__gr_top; // end of GP arg reg save area
void *__vr_top; // end of FP/SIMD arg reg save area
int __gr_offs; // offset from __gr_top to next GP register arg
int __vr_offs; // offset from __vr_top to next FP/SIMD register arg
} va_list;
You will need to update your test bench to use this new structure. There are more details of how va_list works for AArch64 in appendix B of the procedure call standard: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
Regards,
Oliver
Yes you're right thanks.That slipped my mind. I wonder what kind of an overhead all that stuff is, for instance va_list stores a lot of floating point registers and one might never access them. If one had a standards option to say one had to always use function prototypes the business could be done much more efficiently - the varadic parameters could be passed on the stack every time and accessing them would just get the next item off the stack. I think this is one of the nastier legacies of K&R C.