This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

float behaivior on AARCH64

Hello,

forgive me if my question is a litte bit weak in content and linguistic. I'm only a Hobbyist and english is not my nativ.

I'm trying to compile an App from Einstein@Home for AARCH64 using GCC. Einstein@Home is a DC-Projekt using Boinc. The App mainly calculates FFT's in single-precision using FFTW-lib. After the calculations are finished it send's them back to the Server and validates them against the same Tasks calculated by an different Host. Host's can vary over many different CPU-Architectures.

The Problem is that Task's from my AARCH64-App-Version are mostly above the threshold of the Validator. Compiling the App with the same settings for the AARCH32-Instructions on the same Device delivers Valid results. I've tryed this with many different Compiler-options, GCC-versions, lib-versions, with and without using of NEON. The result is always the same: AARCH64-invalid, AARCH32-valid.

I've readed in the ARMv8-documentions. But can't find somethink that influences the precion of float calcluations in a way that fit's to my Problem.

Sorry for that squishy Question but I don't know where to start to debug this.

Greetings from Germany

Parents
  • It is possible that some of the calculations are multiply-accumulate operations. In the original ARMv7-A with VFPv3 the VFP and NEON floating point multiply-accumulate was not fused. As of the VFPv4 there exists a fused multiply-accumulate. The difference is in the placement of the rounding step. Fused is more 'accurate' and hence you will get a different result, a binary-to-binary comparison of the results will doubtlessly fail if different systems used fused vs. non-fused. Which one the compiler uses on which ARM architecture revision and floating point implementation, across other architectures such as PowerPC or x86-64, is up to the settings.

    Most compilers should have a switch to ensure that fused multiply-accumulate is not used (something like -mno-fused-madd or -ffp-contract perhaps) or you can restrict code generation to a VFP programmers' model which does not include the fused versions. Depending on the compiler, though, whether it uses multiply-accumulate that is defined with the obverse rounding step or a floating-point multiply, then a floating-point add, is also totally up to the compiler.

    Koumoto-san is correct, also -- it is possible that some implicit single<>double precision conversion may also be occurring, which is also up to your compiler, unfortunately.

Reply
  • It is possible that some of the calculations are multiply-accumulate operations. In the original ARMv7-A with VFPv3 the VFP and NEON floating point multiply-accumulate was not fused. As of the VFPv4 there exists a fused multiply-accumulate. The difference is in the placement of the rounding step. Fused is more 'accurate' and hence you will get a different result, a binary-to-binary comparison of the results will doubtlessly fail if different systems used fused vs. non-fused. Which one the compiler uses on which ARM architecture revision and floating point implementation, across other architectures such as PowerPC or x86-64, is up to the settings.

    Most compilers should have a switch to ensure that fused multiply-accumulate is not used (something like -mno-fused-madd or -ffp-contract perhaps) or you can restrict code generation to a VFP programmers' model which does not include the fused versions. Depending on the compiler, though, whether it uses multiply-accumulate that is defined with the obverse rounding step or a floating-point multiply, then a floating-point add, is also totally up to the compiler.

    Koumoto-san is correct, also -- it is possible that some implicit single<>double precision conversion may also be occurring, which is also up to your compiler, unfortunately.

Children