Hi everyone,
I was working to develop armv8.7 feature FEAT_AFP. There I come across one of the bit enabled in FPCR register i.e., NEP bit - bit[2] of FPCR register (Floating point control register), according to the documentation following is mentioned (attaching link to screen shot).
/resized-image/__size/640x480/__key/communityserver-discussions-components-files/468/fpcr.nep.png
For instance take FMADD:
FMADD (three input scalar version) : Floating-point fused Multiply-Add (scalar)
Here, as per the documentation upper bits of Sd will be populated by upper bits of Sa. If we take an example of 32 bit precision, then upper (128-32)=96 bits of Vd register will be populated by upper 96 bits of Va. This is I have verified using Trace32 debugger tool.
But I cannot find any use case for this. Since, we are directly populating the addend to destination register we cannot say we are increasing the precision in some way. Can anyone please explain the use case of this bit?
Thanks.
Unless someone who knows exactly arrives, below are my views:
It seems to me that the NEP bit allows for merging of the accumulator. For e.g., when considering fmadd scalar version for single-precision floats:
a1 = n0*m0 + a0
a2 = n1*m1 + a1
etc.
By setting NEP bit, the entire accumulator (128-bits for e.g.) is preserved, even though the calculations are being performed on the lowest element (element #0, spanning [31:0] of the [127:0] accumulator).
The NEP bit is being consulted in the function IsMerging; the function pseudo-code (and the fmadd pseudo-code) is available in the armv8 manual. That bit being 1 is taken as a sign to perform merging, where the exact merge depends on the instruction being executed.
I checked glibc to see if it explicitly enables NEP; afaics, it does not. This, and other bits that are part of FEAT_AFP, are exposed to the usermode/glibc through one of the ELF hwcaps, HWCAP2_AFP. It is likely that some application must have enabled NEP. Since you encountered NEP being set, can you say which application you were running which may have caused the NEP bit to be set? Are you on Linux, or on Windows, or Apple?
Note also that other features of FEAT_AFP, such as flushing the output and input denormals to zero, are present in amd64 processors (and may be other CPUs) also. Moreover, the Apple's arm64 CPU already had their own implementation before Arm came up with the architecturally supported features in armv8.7. The armv8 manual states that such features allow for more efficient processing by sacrificing IEEE conformance in such edge cases.