A C function that takes signed-1.31 fixed-point numbers (given as int32_t type) converts the inputs to float type:
#include <stdint.h>float Fixed_to_FP(int32_t x) { return ((float) (x) / 0x1p31f);}
When compiling the code using armclang with the following build options:
-march=armv7+fp -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb
the tool emits two instructions for the conversion operation:
Fixed_to_FP: vmov s0, r0 vldr s2, .LCPI0_0 vcvt.f32.s32 s0, s0 ; (1) Convert from int32_t to float vmul.f32 s0, s0, s2 ; (2) Scale down by 2^31 bx lr .LCPI0_0: .long 0x30000000 ; = 2^31
However, the VCVT instruction has an optional #fbits parameter, which in this case could be used to spare the vmul instruction:
vcvt.f32.s32 s0, s0, 31 ; Convert from S1.31 to float
How can I get the compiler to emit this efficient instruction?
Thanks!
Hi
My name is Stephen and I work for Arm.
Sorry, armclang is unable to optimize that function in pure C code, but a workaround is to use inline assembly like this:
#include <stdint.h>float Fixed_to_FP(int32_t x) { float f; __asm volatile ("vmov %[flt], %[integer]\n" "vcvt.f32.s32 %[flt],%[flt],#31\n" : [flt] "=w" (f) : [integer] "r" (x)); return f;}
when compiled with:
armclang -mfloat-abi=hard -mfpu=fpv4-sp-d16 -std=c11 --target=arm-arm-none-eabi -O3 -ffast-math -mcpu=cortex-m4 -mthumb
generates:
Fixed_to_FP: vmov s0, r0 vcvt.f32.s32 s0, s0, #31 bx lr
Some notes on the inline assembler:You can't cast x to f and pass in f or the compiler will insert a vcvt.=w is a floating point register that is written to after it has been read so the compiler can assign same register to it.r is a core register input to the block.
Hope this helps
Stephen
Thanks, Stephen!
Yes, I am trying to avoid inline assembly, as I am not sure this will be acceptable in our project.
One interesting thing to note is that it is not armclang that is not able to generate this code, but specifically when built for ARMv7 arch. Apparently, when building for ARMv8, the optimizer emits the right instruction.
For an example, you can follow the link in the answer to my question in Stack Overflow:
stackoverflow.com/.../274579