Hi everyone,
As the title states - I've had issues reproducing flush-to-zero (FTZ) using the NEON intrinsics provided in the 'arm_neon.h' header. For test purposes I'm using an iPhone 6 with an ARMv8-A dual-core ('Twister') CPU.
In the ARM information center under Home>Neon Programming>Flush-to-zero mode in NEO (ARM Information Center) I see that 'NEON always uses flush-to-zero mode'.
What calculation on NEON produces Flush-to-zero (FTZ) but does not on IEEE 754 floating point compatible processors?
As yet I don't see any difference between IEEE 754 and the results returned by ARM NEON intrinsic operations.
Here is an attempt I made:
//Allocate some float buffers Float32 *inFloatsA = (Float32*)malloc(sizeof(Float32)*4); Float32 *inFloatsB = (Float32*)malloc(sizeof(Float32)*4); Float32 *outFloats = (Float32*)malloc(sizeof(Float32)*4); // Initialise input values for (int i = 0; i<4; i++) { inFloatsA[i] = (Float32)-2e-125; inFloatsB[i] = (Float32)1e-100; } //Subtract inFloatsB from inFloatsA and store in outFloats float32x4_t neonFloatsBufferA = vld1q_f32(&inFloatsA[0]); float32x4_t neonFloatsBufferB = vld1q_f32(&inFloatsB[0]); float32x4_t result = vsubq_f32(neonFloatsBufferA, neonFloatsBufferB); vst1q_f32(&outFloats[0], result); //Calculate the expected IEEE 754 value Float32 expected = inFloatsA[0] - inFloatsB[0]; //Test if the IEEE 754 value matches the NEON output if (expected != outFloats[0]) { printf("Got a different value than IEEE 754!\n"); }
Essentially I never see the log 'Got a different value than IEEE 754!'. Is there a set of initial input values that would create a FTZ effect on NEON?
Am I incorrect in thinking 'inFloatsA[0] - inFloatsB[0]' will use the IEEE 754 standard?
Kind Regards,
David L
David,
The values you are looking for are in the range 2x10-38 to 2x10-45, i.e. the values you are using above are too small even for denormal / subnormals.
As an example, given something like:
float smallest_f32_denorm(void) { return 2e-45f; }
Adding the result of this function to itself using Neon/Advanced-SIMD in A32 state will always produce zero, while adding using VFP with FtZ disabled will produce a very small non-zero value.
However, care must be taken to ensure that the compiler is also in full IEEE754 mode, otherwise there is a potential that the compiler may choose to reduce the above function to just return zero (i.e. to implement FtZ during compilation).
hth
Simon.
Hi Simon,
Many thanks for the help!
The range info is very useful; I've tried performing addition using the value you provided and rather than a zero result, I'm seeing a very small value (2.802597e-45)
I'm running this in A64 state - does that change when I should expect FtZ in NEON operations?
I'm a little confused about this:
adding using VFP with FtZ disabled will produce a very small non-zero value.
Because I'm using NEON intrinsics, surely this should not influence NEON results? As the ARM documentation states:
NEON always uses flush-to-zero mode
Is there a compile flag for disabling FtZ on VFP?
Many thanks for your reply - it's much appreciated!
In A64 (unlike A32) the Advanced-SIMD/Neon can support both FtZ and full denormal operation, controlled by the same bit that determines the regular FP operation mode.