This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

I'm not seeing any flush-to-zero (FTZ) effects with NEON intrinsics on an ARM A9, any advice?

Hi everyone,

As the title states - I've had issues reproducing flush-to-zero (FTZ) using the NEON intrinsics provided in the 'arm_neon.h' header. For test purposes I'm using an iPhone 6 with an ARMv8-A dual-core ('Twister') CPU.

In the ARM information center under Home>Neon Programming>Flush-to-zero mode in NEO (ARM Information Center) I see that 'NEON always uses flush-to-zero mode'.

What calculation on NEON produces Flush-to-zero (FTZ) but does not on IEEE 754 floating point compatible processors?

As yet I don't see any difference between IEEE 754 and the results returned by ARM NEON intrinsic operations.

Here is an attempt I made:

  //Allocate some float buffers
    Float32 *inFloatsA = (Float32*)malloc(sizeof(Float32)*4);
    Float32 *inFloatsB = (Float32*)malloc(sizeof(Float32)*4);
    Float32 *outFloats = (Float32*)malloc(sizeof(Float32)*4);
   
    // Initialise input values
    for (int i = 0; i<4; i++)
    {
        inFloatsA[i] = (Float32)-2e-125;
        inFloatsB[i] = (Float32)1e-100;
    }
   
    //Subtract inFloatsB from inFloatsA and store in outFloats
    float32x4_t neonFloatsBufferA = vld1q_f32(&inFloatsA[0]);
    float32x4_t neonFloatsBufferB = vld1q_f32(&inFloatsB[0]);
    float32x4_t result = vsubq_f32(neonFloatsBufferA, neonFloatsBufferB);
    vst1q_f32(&outFloats[0], result);

    //Calculate the expected IEEE 754 value
    Float32 expected = inFloatsA[0] - inFloatsB[0];

    //Test if the IEEE 754 value matches the NEON output
    if (expected != outFloats[0])
    {    
          printf("Got a different value than IEEE 754!\n");
    }

Essentially I never see the log 'Got a different value than IEEE 754!'. Is there a set of initial input values that would create a FTZ effect on NEON?

Am I incorrect in thinking 'inFloatsA[0] - inFloatsB[0]' will use the IEEE 754 standard?

Kind Regards,

David L

+1 Simon Craske over 8 years ago
David,
The values you are looking for are in the range 2x10^-38 to 2x10^-45, i.e. the values you are using above are too small even for denormal / subnormals.
As an example, given something like:
float smallest_f32_denorm(void) { return 2e-45f; }
Adding the result of this function to itself using Neon/Advanced-SIMD in A32 state will always produce zero, while adding using VFP with FtZ disabled will produce a very small non-zero value.
However, care must be taken to ensure that the compiler is also in full IEEE754 mode, otherwise there is a potential that the compiler may choose to reduce the above function to just return zero (i.e. to implement FtZ during compilation).
hth
Simon.
Cancel
Up 0 Down

Cancel
0 David L over 8 years ago in reply to Simon Craske

Hi Simon,
Many thanks for the help!
The range info is very useful; I've tried performing addition using the value you provided and rather than a zero result, I'm seeing a very small value (2.802597e-45)
I'm running this in A64 state - does that change when I should expect FtZ in NEON operations?
I'm a little confused about this:
adding using VFP with FtZ disabled will produce a very small non-zero value.
Because I'm using NEON intrinsics, surely this should not influence NEON results? As the ARM documentation states:
NEON always uses flush-to-zero mode
Is there a compile flag for disabling FtZ on VFP?
Many thanks for your reply - it's much appreciated!
Kind Regards,
David L
Cancel
Up 0 Down

Cancel
+1 Simon Craske over 8 years ago in reply to David L

David,
In A64 (unlike A32) the Advanced-SIMD/Neon can support both FtZ and full denormal operation, controlled by the same bit that determines the regular FP operation mode.
Simon.
Cancel
Up 0 Down

Cancel