How to generate a VCVT instruction with the #fbits param on CM4?

A C function that takes signed-1.31 fixed-point numbers (given as int32_t type) converts the inputs to float type:

#include <stdint.h>
float Fixed_to_FP(int32_t x) {
    return ((float) (x) / 0x1p31f);
}

When compiling the code using armclang with the following build options:

-march=armv7+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb

the tool emits two instructions for the conversion operation:

Fixed_to_FP:
        vmov    s0, r0
        vldr    s2, .LCPI0_0
        vcvt.f32.s32    s0, s0      ; (1) Convert from int32_t to float
        vmul.f32        s0, s0, s2  ; (2) Scale down by 2^31
        bx      lr
.LCPI0_0:
        .long   0x30000000          ; = 2^31

However, the VCVT instruction has an optional #fbits parameter, which in this case could be used to spare the vmul instruction:

        vcvt.f32.s32    s0, s0, 31  ; Convert from S1.31 to float

How can I get the compiler to emit this efficient instruction?

Thanks!

  • Hi

    My name is Stephen and I work for Arm.

    Sorry, armclang is unable to optimize that function in pure C code, but a workaround is to use inline assembly like this:

    #include <stdint.h>
    float Fixed_to_FP(int32_t x) {
      float f;
      __asm volatile ("vmov %[flt], %[integer]\n"
                      "vcvt.f32.s32 %[flt],%[flt],#31\n"
                      : [flt] "=w" (f) : [integer] "r" (x));
      return f;
    }

    when compiled with:

    armclang -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb

    generates:

    Fixed_to_FP:
        vmov    s0, r0
        vcvt.f32.s32    s0, s0, #31
        bx    lr

    Some notes on the inline assembler:
    You can't cast x to f and pass in f or the compiler will insert a vcvt.
    =w is a floating point register that is written to after it has been read so the compiler can assign same register to it.
    r is a core register input to the block.

    Hope this helps

    Stephen

  • Thanks, Stephen!

    Yes, I am trying to avoid inline assembly, as I am not sure this will be acceptable in our project.

    One interesting thing to note is that it is not armclang that is not able to generate this code, but specifically when built for ARMv7 arch. Apparently, when building for ARMv8, the optimizer emits the right instruction.

    For an example, you can follow the link in the answer to my question in Stack Overflow:

    stackoverflow.com/.../274579