This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

BUG: fmaf() large magnitude slower than a*b+c, __fmaf_hardfp() bugged

Using arm compiler 6, on a Cortex M7 I found a hard bug using fmaf in some linear interpolation that is iterated in a large 2d image loop.
When I change it to simply a*b+c I see the assembler has changed from __fmaf_hardfp() to vmla.f32

I now have realtime performance for the image loop. I tried some inline assembly to use vfma.f32 but haven't been successful.

What on earth is going on in __fmaf_hardfp() ???

Using latest MDK ARM with all fast optimizations on.


STATIC_INLINE_PURE float lerp(float const A, float const B, float const tNorm)
{
        // fma(t, v1, fma(-t, v0, v0));
        return( fmaf(tNorm, B, fmaf(-tNorm, A, A)) );
}