This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Armv8.7 extension- FEAT_AFP : FPCR.NEP use case

Hi everyone,

I was working to develop armv8.7 feature FEAT_AFP. There I come across one of the bit enabled in FPCR register i.e., NEP bit - bit[2] of FPCR register (Floating point control register), according to the documentation following is mentioned (attaching link to screen shot).

/resized-image/__size/640x480/__key/communityserver-discussions-components-files/468/fpcr.nep.png

For instance take FMADD:

FMADD (three input scalar version) : Floating-point fused Multiply-Add (scalar)

FMADD <Sd>,<Sm> ,<Sn> ,<Sa>
Sd= Sm*Sn + Sa
if FEAT_AFP is implemented
- FPCR.NEP=0, no affect
- FPCR.NEP=1, output other than lowest= Sa

Here, as per the documentation upper bits of Sd will be populated by upper bits of Sa. If we take an example of 32 bit precision, then upper (128-32)=96 bits of Vd register will be populated by upper 96 bits of Va. This is I have verified using Trace32 debugger tool.

But I cannot find any use case for this. Since, we are directly populating the addend to destination register we cannot say we are increasing the precision in some way. Can anyone please explain the use case of this bit?

Thanks.

Top replies

a.surati over 2 years ago in reply to Sarthak +1 verified

Unless someone who knows exactly arrives, below are my views: It seems to me that the NEP bit allows for merging of the accumulator. For e.g., when considering fmadd scalar version for single-precision...

Parents

0 a.surati over 2 years ago

Sarthak said:
ince, we are directly populating the addend to destination register we cannot say we are increasing the precision in some way.

That only means that the output elements other than the lowest are calculated as Sd[e]= Sm[e] * Sn[e] + Sa[e], where Sm[e] = Sn[e] = 0.0.

I suppose the equivalent vector/SIMD operation is: output = <0,0,0,m0> * <0,0,0,n0> + <a3,a2,a1,a0>
Cancel
Up 0 Down

Cancel

Reply

0 a.surati over 2 years ago

Sarthak said:
ince, we are directly populating the addend to destination register we cannot say we are increasing the precision in some way.

That only means that the output elements other than the lowest are calculated as Sd[e]= Sm[e] * Sn[e] + Sa[e], where Sm[e] = Sn[e] = 0.0.

I suppose the equivalent vector/SIMD operation is: output = <0,0,0,m0> * <0,0,0,n0> + <a3,a2,a1,a0>
Cancel
Up 0 Down

Cancel

Children

0 Sarthak over 2 years ago in reply to a.surati

Thanks for the response.

That's correct that's what it means. What I want to understand is that what is the use case of this particular extension. Is there any library support already present for arm extensions?
Cancel
Up 0 Down

Cancel
+1 a.surati over 2 years ago in reply to Sarthak

Unless someone who knows exactly arrives, below are my views:

It seems to me that the NEP bit allows for merging of the accumulator. For e.g., when considering fmadd scalar version for single-precision floats:

a1 = n0*m0 + a0

a2 = n1*m1 + a1

etc.

By setting NEP bit, the entire accumulator (128-bits for e.g.) is preserved, even though the calculations are being performed on the lowest element (element #0, spanning [31:0] of the [127:0] accumulator).

The NEP bit is being consulted in the function IsMerging; the function pseudo-code (and the fmadd pseudo-code) is available in the armv8 manual. That bit being 1 is taken as a sign to perform merging, where the exact merge depends on the instruction being executed.

I checked glibc to see if it explicitly enables NEP; afaics, it does not. This, and other bits that are part of FEAT_AFP, are exposed to the usermode/glibc through one of the ELF hwcaps, HWCAP2_AFP. It is likely that some application must have enabled NEP. Since you encountered NEP being set, can you say which application you were running which may have caused the NEP bit to be set? Are you on Linux, or on Windows, or Apple?

Note also that other features of FEAT_AFP, such as flushing the output and input denormals to zero, are present in amd64 processors (and may be other CPUs) also. Moreover, the Apple's arm64 CPU already had their own implementation before Arm came up with the architecturally supported features in armv8.7. The armv8 manual states that such features allow for more efficient processing by sacrificing IEEE conformance in such edge cases.
Cancel
Up +1 Down

Cancel
0 a.surati over 2 years ago in reply to Sarthak

Unless someone who knows exactly arrives, below are my views:

It seems to me that the NEP bit allows for merging of the accumulator. For e.g., when considering fmadd scalar version for single-precision floats:

a1 = n0*m0 + a0

a2 = n1*m1 + a1

etc.

By setting NEP bit, the entire accumulator (128-bits for e.g.) is preserved, even though the calculations are being performed on the lowest element (element #0, spanning [31:0] of the [127:0] accumulator).

The NEP bit is being consulted in the function IsMerging; the function pseudo-code (and the fmadd pseudo-code) is available in the armv8 manual. That bit being 1 is taken as a sign to perform merging, where the exact merge depends on the instruction being executed.

I checked glibc to see if it explicitly enables NEP; afaics, it does not. This, and other bits that are part of FEAT_AFP, are exposed to the usermode/glibc through one of the ELF hwcaps, HWCAP2_AFP. It is likely that some application must have enabled NEP. Since you encountered NEP being set, can you say which application you were running which may have caused the NEP bit to be set? Are you on Linux, or on Windows, or Apple?

Note also that other features of FEAT_AFP, such as flushing the output and input denormals to zero, are present in amd64 processors (and may be other CPUs) also. Moreover, the Apple's arm64 CPU already had their own implementation before Arm came up with the architecturally supported features in armv8.7. The armv8 manual states that such features allow for more efficient processing by sacrificing IEEE conformance in such edge cases.
Cancel
Up 0 Down

Cancel
0 Sarthak over 2 years ago in reply to a.surati

That makes sense, accumulation can be a use case. I am actually using debugger tool to validate this feature, so I am explicitly setting this bit. Thanks.
Cancel
Up 0 Down

Cancel