We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Using arm compiler 6, on a Cortex M7 I found a hard bug using fmaf in some linear interpolation that is iterated in a large 2d image loop. When I change it to simply a*b+c I see the assembler has changed from __fmaf_hardfp() to vmla.f32
I now have realtime performance for the image loop. I tried some inline assembly to use vfma.f32 but haven't been successful.
What on earth is going on in __fmaf_hardfp() ???
Using latest MDK ARM with all fast optimizations on.
STATIC_INLINE_PURE float lerp(float const A, float const B, float const tNorm) { // fma(t, v1, fma(-t, v0, v0)); return( fmaf(tNorm, B, fmaf(-tNorm, A, A)) ); }