This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Embedded assembly function problem

Hello all,

I wrote end embedded assembly function for an ARM Cortex A9 (the specific device is Zynq, from Xilinx) as follow

float my_fun(float x)

{

                asm volatile ("vdup.f32 d0, r0                     \n\t");

                my_neon_fun(x);

                asm volatile ("vmov.f32 r0, s0                     \n\t");

};

my_neon_fun(x) is another already tested and working function that uses NEON assembly instructions and returns its result in S0. According to ARM convention, I copy the return value in R0 as last operation. However, when I compile the entire project, the compiler add to the generated assembly code, the following instruction as last instruction of my_fun

"mov r0, r3"

This causes that the R0 register, which should contain the return value, is overwritten with a wrong value.

Please note that it ONLY happen when I DO NOT activate the compiler optimization (-O0). Does anyone know why it happens?

Thanks in advance

Regards,

Andrea

  • Hi formentini and welcome to the Community!

    I have moved your question to ARM Processors where I hope you will get your answer.

  • Andrea,

    It may be simpler (and safer) to change the Neon assembly code to use the same floating-point ABI as that used by the C-compiler - I assume at present the compiler is using "softfp" while the assembly code is using "hard".

    Failing that, you can either investigate making the wrapper function "naked", or you can group the instructions within a single asm block; additionally (as shown below) you need to provide appropriate I/O and clobber lists:

    extern float my_neon_fun(float);
    
    float my_fun(float x)
    {
      asm volatile ("vdup.f32  d0,%[r_op]  \n\t"
                    "blx       %[r_fn]     \n\t"
                    "vmov.f32  %[r_op],s0  \n\t"
                    : [r_op]"+r" (x)
                    : [r_fn]"r"  (my_neon_fun)
                    : "r14" );
    
      return x;
    }
    

    Note that the above only marks the link register as clobbered, if my_neon_fun() modifies any other registers or memory, these would also need listing.

    hth

    Simon.

  • Hi Andrea,

    I think that the reason the "mov r0, r3" will appear would be why you compiled the code with the software float mode.

    Did you add the "-mhard-float" option?

    By the way, is the my_neon_fun that an input type is the double and an output type is the float or void?

    If it is correct, "vcvt.f64.f32 d0, s0" would be inserted into between "vdup.f32 d0, r0" and "bl my_neon_fun", because the variable 'x' type is the float.
    In order to avoid this, you had better follow Simon's idea.

    Best regards,

    Yasuhiko Koumoto.