This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

float behaivior on AARCH64

Hello,

forgive me if my question is a litte bit weak in content and linguistic. I'm only a Hobbyist and english is not my nativ.

I'm trying to compile an App from Einstein@Home for AARCH64 using GCC. Einstein@Home is a DC-Projekt using Boinc. The App mainly calculates FFT's in single-precision using FFTW-lib. After the calculations are finished it send's them back to the Server and validates them against the same Tasks calculated by an different Host. Host's can vary over many different CPU-Architectures.

The Problem is that Task's from my AARCH64-App-Version are mostly above the threshold of the Validator. Compiling the App with the same settings for the AARCH32-Instructions on the same Device delivers Valid results. I've tryed this with many different Compiler-options, GCC-versions, lib-versions, with and without using of NEON. The result is always the same: AARCH64-invalid, AARCH32-valid.

I've readed in the ARMv8-documentions. But can't find somethink that influences the precion of float calcluations in a way that fit's to my Problem.

Sorry for that squishy Question but I don't know where to start to debug this.

Greetings from Germany

Top replies

Yasuhiko Koumoto over 9 years ago in reply to Christian Dreihsig +1 verified

Hi, please let us know the details of your problem. I would like to know the concrete operations and the results for each AArch32 or AArch64. Best regards, Yasuhiko Koumoto.

Parents

0 Matt Sealey over 9 years ago in reply to Christian Dreihsig

It is possible that some of the calculations are multiply-accumulate operations. In the original ARMv7-A with VFPv3 the VFP and NEON floating point multiply-accumulate was not fused. As of the VFPv4 there exists a fused multiply-accumulate. The difference is in the placement of the rounding step. Fused is more 'accurate' and hence you will get a different result, a binary-to-binary comparison of the results will doubtlessly fail if different systems used fused vs. non-fused. Which one the compiler uses on which ARM architecture revision and floating point implementation, across other architectures such as PowerPC or x86-64, is up to the settings.
Most compilers should have a switch to ensure that fused multiply-accumulate is not used (something like -mno-fused-madd or -ffp-contract perhaps) or you can restrict code generation to a VFP programmers' model which does not include the fused versions. Depending on the compiler, though, whether it uses multiply-accumulate that is defined with the obverse rounding step or a floating-point multiply, then a floating-point add, is also totally up to the compiler.
Koumoto-san is correct, also -- it is possible that some implicit single<>double precision conversion may also be occurring, which is also up to your compiler, unfortunately.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Matt Sealey over 9 years ago in reply to Christian Dreihsig

It is possible that some of the calculations are multiply-accumulate operations. In the original ARMv7-A with VFPv3 the VFP and NEON floating point multiply-accumulate was not fused. As of the VFPv4 there exists a fused multiply-accumulate. The difference is in the placement of the rounding step. Fused is more 'accurate' and hence you will get a different result, a binary-to-binary comparison of the results will doubtlessly fail if different systems used fused vs. non-fused. Which one the compiler uses on which ARM architecture revision and floating point implementation, across other architectures such as PowerPC or x86-64, is up to the settings.
Most compilers should have a switch to ensure that fused multiply-accumulate is not used (something like -mno-fused-madd or -ffp-contract perhaps) or you can restrict code generation to a VFP programmers' model which does not include the fused versions. Depending on the compiler, though, whether it uses multiply-accumulate that is defined with the obverse rounding step or a floating-point multiply, then a floating-point add, is also totally up to the compiler.
Koumoto-san is correct, also -- it is possible that some implicit single<>double precision conversion may also be occurring, which is also up to your compiler, unfortunately.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Christian Dreihsig over 9 years ago in reply to Matt Sealey

The Idea with the fused multiply-accumulate was good, unfortanily it was not my Problem. Setting the -ffp-contract=Off/Fast doesn't make any diffrence to the Output.
I have writen some Lines of addional Debug-Output to the source to see were it run's apart.
Debuging code that is not writen by me is horroible for me and my hobby-knowlegeds. Especially this Project... to many high mathematic Stuff for me . Let's see if I get it fixed anyway.
Cancel
Vote up 0 Vote down

Cancel
+1 Yasuhiko Koumoto over 9 years ago in reply to Christian Dreihsig

Hi,

please let us know the details of your problem.
I would like to know the concrete operations and the results for each AArch32 or AArch64.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up +1 Vote down

Cancel