This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

I'm not seeing any flush-to-zero (FTZ) effects with NEON intrinsics on an ARM A9, any advice?

Hi everyone,

As the title states - I've had issues reproducing flush-to-zero (FTZ) using the NEON intrinsics provided in the 'arm_neon.h' header. For test purposes I'm using an iPhone 6 with an ARMv8-A dual-core ('Twister') CPU.

In the ARM information center under Home>Neon Programming>Flush-to-zero mode in NEO (ARM Information Center) I see that 'NEON always uses flush-to-zero mode'.

What calculation on NEON produces Flush-to-zero (FTZ) but does not on IEEE 754 floating point compatible processors?

As yet I don't see any difference between IEEE 754 and the results returned by ARM NEON intrinsic operations.

Here is an attempt I made:

  //Allocate some float buffers
    Float32 *inFloatsA = (Float32*)malloc(sizeof(Float32)*4);
    Float32 *inFloatsB = (Float32*)malloc(sizeof(Float32)*4);
    Float32 *outFloats = (Float32*)malloc(sizeof(Float32)*4);
   
    // Initialise input values
    for (int i = 0; i<4; i++)
    {
        inFloatsA[i] = (Float32)-2e-125;
        inFloatsB[i] = (Float32)1e-100;
    }
   
    //Subtract inFloatsB from inFloatsA and store in outFloats
    float32x4_t neonFloatsBufferA = vld1q_f32(&inFloatsA[0]);
    float32x4_t neonFloatsBufferB = vld1q_f32(&inFloatsB[0]);
    float32x4_t result = vsubq_f32(neonFloatsBufferA, neonFloatsBufferB);
    vst1q_f32(&outFloats[0], result);

    //Calculate the expected IEEE 754 value
    Float32 expected = inFloatsA[0] - inFloatsB[0];

    //Test if the IEEE 754 value matches the NEON output
    if (expected != outFloats[0])
    {    
          printf("Got a different value than IEEE 754!\n");
    }

Essentially I never see the log 'Got a different value than IEEE 754!'. Is there a set of initial input values that would create a FTZ effect on NEON?

Am I incorrect in thinking 'inFloatsA[0] - inFloatsB[0]' will use the IEEE 754 standard?

Kind Regards,

David L

Parents
  • David,

    The values you are looking for are in the range 2x10-38 to 2x10-45, i.e. the values you are using above are too small even for denormal / subnormals.

    As an example, given something like:

    float smallest_f32_denorm(void)
    {
      return 2e-45f;
    }
    

    Adding the result of this function to itself using Neon/Advanced-SIMD in A32 state will always produce zero, while adding using VFP with FtZ disabled will produce a very small non-zero value.

    However, care must be taken to ensure that the compiler is also in full IEEE754 mode, otherwise there is a potential that the compiler may choose to reduce the above function to just return zero (i.e. to implement FtZ during compilation).

    hth

    Simon.

Reply
  • David,

    The values you are looking for are in the range 2x10-38 to 2x10-45, i.e. the values you are using above are too small even for denormal / subnormals.

    As an example, given something like:

    float smallest_f32_denorm(void)
    {
      return 2e-45f;
    }
    

    Adding the result of this function to itself using Neon/Advanced-SIMD in A32 state will always produce zero, while adding using VFP with FtZ disabled will produce a very small non-zero value.

    However, care must be taken to ensure that the compiler is also in full IEEE754 mode, otherwise there is a potential that the compiler may choose to reduce the above function to just return zero (i.e. to implement FtZ during compilation).

    hth

    Simon.

Children