This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

float behaivior on AARCH64

Hello,

forgive me if my question is a litte bit weak in content and linguistic. I'm only a Hobbyist and english is not my nativ.

I'm trying to compile an App from Einstein@Home for AARCH64 using GCC. Einstein@Home is a DC-Projekt using Boinc. The App mainly calculates FFT's in single-precision using FFTW-lib. After the calculations are finished it send's them back to the Server and validates them against the same Tasks calculated by an different Host. Host's can vary over many different CPU-Architectures.

The Problem is that Task's from my AARCH64-App-Version are mostly above the threshold of the Validator. Compiling the App with the same settings for the AARCH32-Instructions on the same Device delivers Valid results. I've tryed this with many different Compiler-options, GCC-versions, lib-versions, with and without using of NEON. The result is always the same: AARCH64-invalid, AARCH32-valid.

I've readed in the ARMv8-documentions. But can't find somethink that influences the precion of float calcluations in a way that fit's to my Problem.

Sorry for that squishy Question but I don't know where to start to debug this.

Greetings from Germany

Top replies

Yasuhiko Koumoto over 9 years ago in reply to Christian Dreihsig +1 verified

Hi, please let us know the details of your problem. I would like to know the concrete operations and the results for each AArch32 or AArch64. Best regards, Yasuhiko Koumoto.

Parents

0 Christian Dreihsig over 9 years ago

Okay, time for a little Update:
The 32bit-Version seems to be affected too. The Error occours if I specify the Arch-Type in GCC to ARMv8-A. Compiled with -march=ARMv7-A all Results get Valid. As soon as I switch to -march=ARMv8-A mostly all of my result's get not validated by the server. The output seems to be slighty different. The differens between both is very little. Any suggestions what can cause this differences?
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Christian Dreihsig over 9 years ago

Okay, time for a little Update:
The 32bit-Version seems to be affected too. The Error occours if I specify the Arch-Type in GCC to ARMv8-A. Compiled with -march=ARMv7-A all Results get Valid. As soon as I switch to -march=ARMv8-A mostly all of my result's get not validated by the server. The output seems to be slighty different. The differens between both is very little. Any suggestions what can cause this differences?
Cancel
Vote up 0 Vote down

Cancel

Children

0 Yasuhiko Koumoto over 9 years ago in reply to Christian Dreihsig

Hi,
my guess is that one is executed with 32bit precision because of the float operation and another is executed with 64bit (or much wider) precision to use the same execution logics. Of course, the float data would be converted into the double data and re-converted into the float data.
If the assumption would be right, a little difference of the bit data might occur.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Matt Sealey over 9 years ago in reply to Christian Dreihsig

It is possible that some of the calculations are multiply-accumulate operations. In the original ARMv7-A with VFPv3 the VFP and NEON floating point multiply-accumulate was not fused. As of the VFPv4 there exists a fused multiply-accumulate. The difference is in the placement of the rounding step. Fused is more 'accurate' and hence you will get a different result, a binary-to-binary comparison of the results will doubtlessly fail if different systems used fused vs. non-fused. Which one the compiler uses on which ARM architecture revision and floating point implementation, across other architectures such as PowerPC or x86-64, is up to the settings.
Most compilers should have a switch to ensure that fused multiply-accumulate is not used (something like -mno-fused-madd or -ffp-contract perhaps) or you can restrict code generation to a VFP programmers' model which does not include the fused versions. Depending on the compiler, though, whether it uses multiply-accumulate that is defined with the obverse rounding step or a floating-point multiply, then a floating-point add, is also totally up to the compiler.
Koumoto-san is correct, also -- it is possible that some implicit single<>double precision conversion may also be occurring, which is also up to your compiler, unfortunately.
Cancel
Vote up 0 Vote down

Cancel
0 Christian Dreihsig over 9 years ago in reply to Matt Sealey

The Idea with the fused multiply-accumulate was good, unfortanily it was not my Problem. Setting the -ffp-contract=Off/Fast doesn't make any diffrence to the Output.
I have writen some Lines of addional Debug-Output to the source to see were it run's apart.
Debuging code that is not writen by me is horroible for me and my hobby-knowlegeds. Especially this Project... to many high mathematic Stuff for me . Let's see if I get it fixed anyway.
Cancel
Vote up 0 Vote down

Cancel
+1 Yasuhiko Koumoto over 9 years ago in reply to Christian Dreihsig

Hi,

please let us know the details of your problem.
I would like to know the concrete operations and the results for each AArch32 or AArch64.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up +1 Vote down

Cancel