Hi everyone,
As the title states - I've had issues reproducing flush-to-zero (FTZ) using the NEON intrinsics provided in the 'arm_neon.h' header. For test purposes I'm using an iPhone 6 with an ARMv8-A dual-core ('Twister') CPU.
In the ARM information center under Home>Neon Programming>Flush-to-zero mode in NEO (ARM Information Center) I see that 'NEON always uses flush-to-zero mode'.
What calculation on NEON produces Flush-to-zero (FTZ) but does not on IEEE 754 floating point compatible processors?
As yet I don't see any difference between IEEE 754 and the results returned by ARM NEON intrinsic operations.
Here is an attempt I made:
//Allocate some float buffers Float32 *inFloatsA = (Float32*)malloc(sizeof(Float32)*4); Float32 *inFloatsB = (Float32*)malloc(sizeof(Float32)*4); Float32 *outFloats = (Float32*)malloc(sizeof(Float32)*4); // Initialise input values for (int i = 0; i<4; i++) { inFloatsA[i] = (Float32)-2e-125; inFloatsB[i] = (Float32)1e-100; } //Subtract inFloatsB from inFloatsA and store in outFloats float32x4_t neonFloatsBufferA = vld1q_f32(&inFloatsA[0]); float32x4_t neonFloatsBufferB = vld1q_f32(&inFloatsB[0]); float32x4_t result = vsubq_f32(neonFloatsBufferA, neonFloatsBufferB); vst1q_f32(&outFloats[0], result); //Calculate the expected IEEE 754 value Float32 expected = inFloatsA[0] - inFloatsB[0]; //Test if the IEEE 754 value matches the NEON output if (expected != outFloats[0]) { printf("Got a different value than IEEE 754!\n"); }
Essentially I never see the log 'Got a different value than IEEE 754!'. Is there a set of initial input values that would create a FTZ effect on NEON?
Am I incorrect in thinking 'inFloatsA[0] - inFloatsB[0]' will use the IEEE 754 standard?
Kind Regards,
David L
Hi Simon,
Many thanks for the help!
The range info is very useful; I've tried performing addition using the value you provided and rather than a zero result, I'm seeing a very small value (2.802597e-45)
I'm running this in A64 state - does that change when I should expect FtZ in NEON operations?
I'm a little confused about this:
adding using VFP with FtZ disabled will produce a very small non-zero value.
Because I'm using NEON intrinsics, surely this should not influence NEON results? As the ARM documentation states:
NEON always uses flush-to-zero mode
Is there a compile flag for disabling FtZ on VFP?
Many thanks for your reply - it's much appreciated!
David,
In A64 (unlike A32) the Advanced-SIMD/Neon can support both FtZ and full denormal operation, controlled by the same bit that determines the regular FP operation mode.
Simon.