This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

my_printf compiling for aarch64, armclang vs gcc

I compiled same sample c file, but got very different asmbler code.

The c file is:

void my_print(char *fmt, ...){
    *((volatile int *) 0x1000) =(int) &fmt;
}

void test(void){
    my_print("a", 1, 2, 3, 4, 5, 6, 7, 8, 9);

}

By armclang( ARM Compiler 6.01 (build 22)), command is as "armclang -mcpu=cortex-a53 --target=aarch64-arm-none-eabi test.c" , and the assemble code of my_print is like:

0000000000008000 <my_print>:

    8000:   d10303ff    sub sp, sp, #0xc0

    8004:   3d801fe7    str q7, [sp,#112]

    8008:   3d801be6    str q6, [sp,#96]

    800c:   3d8017e5    str q5, [sp,#80]

    8010:   3d8013e4    str q4, [sp,#64]

    8014:   3d800fe3    str q3, [sp,#48]

    8018:   3d800be2    str q2, [sp,#32]

    801c:   3d8007e1    str q1, [sp,#16]

    8020:   3d8003e0    str q0, [sp]

    8024:   f9005be7    str x7, [sp,#176]

    8028:   f90057e6    str x6, [sp,#168]

    802c:   f90053e5    str x5, [sp,#160]

    8030:   f9004fe4    str x4, [sp,#152]

    8034:   f9004be3    str x3, [sp,#144]

    8038:   f90047e2    str x2, [sp,#136]

    803c:   f90043e1    str x1, [sp,#128]

   8040:   f9005fe0    str x0, [sp,#184]      // Jerry: x0, i.e., the address of fmt, located at the bottom of the stack.

    8044:   9102e3e0    add x0, sp, #0xb8

    8048:   2a0003e8    mov w8, w0

    804c:   321403e9    orr w9, wzr, #0x1000

    8050:   2a0903e0    mov w0, w9

    8054:   b9000008    str w8, [x0]

    8058:   910303ff    add sp, sp, #0xc0

    805c:   d65f03c0    ret

For gcc(aarch64-none-elf-gcc, Linaro GCC 4.8.3), command as "aarch64-none-elf-gcc test.c -nostdlib", then the assemble code is like:  400024:   d10343ff    sub sp, sp, #0xd0

  400024:   d10343ff    sub sp, sp, #0xd0

  400028:   f9004fe1    str x1, [sp,#152]

  40002c:   f90053e2    str x2, [sp,#160]

  400030:   f90057e3    str x3, [sp,#168]

  400034:   f9005be4    str x4, [sp,#176]

  400038:   f9005fe5    str x5, [sp,#184]

  40003c:   f90063e6    str x6, [sp,#192]

  400040:   f90067e7    str x7, [sp,#200]

  400044:   3d8007e0    str q0, [sp,#16]

  400048:   3d800be1    str q1, [sp,#32]

  40004c:   3d800fe2    str q2, [sp,#48]

  400050:   3d8013e3    str q3, [sp,#64]

  400054:   3d8017e4    str q4, [sp,#80]

  400058:   3d801be5    str q5, [sp,#96]

  40005c:   3d801fe6    str q6, [sp,#112]

  400060:   3d8023e7    str q7, [sp,#128]

  400064:   f90003e0    str x0, [sp]               // Jerry : x0, i.e., the address of fmt, located at the top of the stack.

  400068:   d2820000    mov x0, #0x1000                 // #4096

  40006c:   910003e1    mov x1, sp

  400070:   b9000001    str w1, [x0]

  400074:   910343ff    add sp, sp, #0xd0

  400078:   d65f03c0    ret

The armclang is behavior is as expected? And gcc voilated Procedure Call Standard?

B.R

Jerry

Parents
  • Hi Jerry,

    The PCS requires that fmt be passed into my_print in x0, and both compilers follow this.

    Your code takes the address of fmt, so it must be stored into memory somewhere. It is up to the compiler exactly where this is. Armclang chooses "sp + 0xb8", and gcc chooses "sp". Both compilers then emit the correct code to store that address into the volatile variable: note the lines "add x0, sp, #0xb8" in the armclang output and "mov x1, sp" in the gcc output.

    Is it possible that you meant to store the value of fmt, not the address? Your example code will store a pointer to a local variable, which will not be valid once my_print returns.

    Oliver

Reply
  • Hi Jerry,

    The PCS requires that fmt be passed into my_print in x0, and both compilers follow this.

    Your code takes the address of fmt, so it must be stored into memory somewhere. It is up to the compiler exactly where this is. Armclang chooses "sp + 0xb8", and gcc chooses "sp". Both compilers then emit the correct code to store that address into the volatile variable: note the lines "add x0, sp, #0xb8" in the armclang output and "mov x1, sp" in the gcc output.

    Is it possible that you meant to store the value of fmt, not the address? Your example code will store a pointer to a local variable, which will not be valid once my_print returns.

    Oliver

Children
  • Thanks for your comment, Oliver.

    It is intended to write the address of fmt(sp+0xb8 for armclang, and sp for gcc) to 0x1000, this will tigger my RTL test bench task to 'print' to logfile.

    I used ARMCompiler 5.x for Cortex-A9/A7, armcc put r0(address of fmt) at the top of stack, and my 'print' works well. Now I using ARMComipler6, while armclang put it at the bottom, thus my 'print' doesn't work anymore. I just wondering why armclong not follow the old style.

    B.R
    Jerry

  • The behavior was never guaranteed even to a weak extent. Youcould use 0x1008 for the pointer to the arguments as in the following code. This is still not fully standards conformant but it should work.

    #include <stdarg.h>

    void my_print(const char *fmt, ...){

         va_list args;

         va_start(args, fmt);

         *((volatile va_list *) 0x1008) = args;

         *((volatile const char *) 0x1000) = fmt;

         va_end(args);

    }


    Note that pointers no longer fit into integers. I like to use const whenever possible too for things like fmt.

  • I my earlier reply I made the assumption that you only needed to store the location of fmt, but I see now that you actually need to access all of the variadic arguments to my_print from your test bench. Daith's solution of using the va_list type should work, though since you have switched from Cortex-A9/A7 to Cortex-A53 (in AArch64 mode) you will need to update your test bench as AArch64 mode uses a different ABI, with a different definition of va_list.

    In the 32-bit ABI, va_list contains a single pointer (32 bits) to the next argument, and all arguments are stored contiguously. In the 64-bit ABI, arguments passed in registers are stored in separate blocks, and va_list is defined as follows:

    typedef struct __va_list {

      void *__stack;  // next stack param

      void *__gr_top; // end of GP arg reg save area

      void *__vr_top; // end of FP/SIMD arg reg save area

      int __gr_offs;  // offset from __gr_top to next GP register arg

      int __vr_offs;  // offset from __vr_top to next FP/SIMD register arg

    } va_list;

    You will need to update your test bench to use this new structure. There are more details of how va_list works for AArch64 in appendix B of the procedure call standard: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf

    Regards,

    Oliver

  • Yes you're right thanks.That slipped my mind. I wonder what kind of an overhead all that stuff is, for instance va_list stores a lot of floating point registers and one might never access them. If one had a standards option to say one had to always use function prototypes the business could be done much more efficiently - the varadic parameters could be passed on the stack every time and accessing them would just get the next item off the stack. I think this is one of the nastier legacies of K&R C.