How to generate a VCVT instruction with the #fbits param on CM4?

A C function that takes signed-1.31 fixed-point numbers (given as int32_t type) converts the inputs to float type:

#include <stdint.h>
float Fixed_to_FP(int32_t x) {
    return ((float) (x) / 0x1p31f);
}

When compiling the code using armclang with the following build options:

-march=armv7+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb

the tool emits two instructions for the conversion operation:

Fixed_to_FP:
        vmov    s0, r0
        vldr    s2, .LCPI0_0
        vcvt.f32.s32    s0, s0      ; (1) Convert from int32_t to float
        vmul.f32        s0, s0, s2  ; (2) Scale down by 2^31
        bx      lr
.LCPI0_0:
        .long   0x30000000          ; = 2^31

However, the VCVT instruction has an optional #fbits parameter, which in this case could be used to spare the vmul instruction:

        vcvt.f32.s32    s0, s0, 31  ; Convert from S1.31 to float

How can I get the compiler to emit this efficient instruction?

Thanks!

Parents
  • Hi

    My name is Stephen and I work for Arm.

    Sorry, armclang is unable to optimize that function in pure C code, but a workaround is to use inline assembly like this:

    #include <stdint.h>
    float Fixed_to_FP(int32_t x) {
      float f;
      __asm volatile ("vmov %[flt], %[integer]\n"
                      "vcvt.f32.s32 %[flt],%[flt],#31\n"
                      : [flt] "=w" (f) : [integer] "r" (x));
      return f;
    }

    when compiled with:

    armclang -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb

    generates:

    Fixed_to_FP:
        vmov    s0, r0
        vcvt.f32.s32    s0, s0, #31
        bx    lr

    Some notes on the inline assembler:
    You can't cast x to f and pass in f or the compiler will insert a vcvt.
    =w is a floating point register that is written to after it has been read so the compiler can assign same register to it.
    r is a core register input to the block.

    Hope this helps

    Stephen

Reply
  • Hi

    My name is Stephen and I work for Arm.

    Sorry, armclang is unable to optimize that function in pure C code, but a workaround is to use inline assembly like this:

    #include <stdint.h>
    float Fixed_to_FP(int32_t x) {
      float f;
      __asm volatile ("vmov %[flt], %[integer]\n"
                      "vcvt.f32.s32 %[flt],%[flt],#31\n"
                      : [flt] "=w" (f) : [integer] "r" (x));
      return f;
    }

    when compiled with:

    armclang -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb

    generates:

    Fixed_to_FP:
        vmov    s0, r0
        vcvt.f32.s32    s0, s0, #31
        bx    lr

    Some notes on the inline assembler:
    You can't cast x to f and pass in f or the compiler will insert a vcvt.
    =w is a floating point register that is written to after it has been read so the compiler can assign same register to it.
    r is a core register input to the block.

    Hope this helps

    Stephen

Children