This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM 23.04 compilers generate incorrect code for GROMACS from -02

I faced several issues when using the latest ARM 23.04.1 compilers with GROMACS on Fugaku (aka A64fx aka SVE 512bits)

This is the most problematic one. FWIW ARM compilers 23.1 works great, and even with -Ofast

The issue can be evidenced with the latest GROMACS 2023.1 and the regression test suite that can both be downloaded from https://manual.gromacs.org/2023.1/download.html

This is an extract of the two tests that fail:

Testing awh_multibias . . .

gmx grompp -f /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/grompp.mdp -c /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/conf -r /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/conf -p /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/topol -ref /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/rotref -maxwarn 10 -n /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/index >grompp.out 2>grompp.err gmx check -s1 /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >checktpr.out 2>checktpr.err

gmx mdrun -ntmpi 1 -ntomp 1 -notunepme -cpi /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/continue -noappend >mdrun.out 2>&1

gmx check -e /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/reference_s.edr -e2 ener.part0002.edr -tol 0.001 -abstol 0.05 -lastener Potential >checkpot.out 2>checkpot.err

gmx check -f /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multibias/reference_s.trr -f2 traj.part0002.trr -tol 0.001 -abstol 0.05 >checkforce.out 2>checkforce.err

FAILED.

Check checkpot.out (200 errors), checkforce.out (38 errors) file(s) in awh_multibias for awh_multibias

Testing awh_multidim . . .

gmx grompp -f /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/grompp.mdp -c /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/conf -r /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/conf -p /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/topol -ref /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/rotref -maxwarn 10 -n /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/index >grompp.out 2>grompp.err

gmx check -s1 /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/reference_s.tpr -s2 topol.tpr -tol 0.0001 -abstol 0.001 >checktpr.out 2>checktpr.err gmx mdrun -ntmpi 1 -ntomp 1 -notunepme >mdrun.out 2>&1

gmx check -e /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05 -lastener Potential >checkpot.out 2>checkpot.err

gmx check -f /home/rist/r00018/src/regressiontests-2023.1/complex/awh_multidim/reference_s.trr -f2 traj.trr -tol 0.001 -abstol 0.05 >checkforce.out 2>checkforce.err

FAILED. Check checkpot.out (106 errors), checkforce.out (3 errors) file(s) in awh_multidim for awh_multidim T

his is how I built GROMACS with ARM compilers and -O2

/usr/bin/cmake -G 'Unix Makefiles' -DCMAKE_INSTALL_PREFIX:STRING=$HOME/local/gromacs-2023.1/arm-23.04.1/2 -DCMAKE_BUILD_TYPE:STRING=Release -DBUILD_TESTING:BOOL=OFF -DCMAKE_INTERPROCEDURAL_OPTIMIZATION:BOOL=OFF -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DGMX_INSTALL_LEGACY_API=ON -DGMX_HWLOC:BOOL=ON -DGMX_GPU:STRING=OFF -DGMX_SIMD=ARM_SVE -DGMX_SIMD_ARM_SVE_LENGTH=512 -DGMX_USE_RDTSCP:BOOL=OFF -DGMX_OPENMP:BOOL=ON -DGMX_USE_RDTSCP:BOOL=OFF -DGMX_CYCLE_SUBCOUNTERS:BOOL=ON '-DCMAKE_C_FLAGS_RELEASE=-O2 -DNDEBUG' '-DCMAKE_CXX_FLAGS_RELEASE=-O2 -DNDEBUG' -DGMX_FFT_LIBRARY=fftpack -DGMX_MPI:BOOL=OFF -DCMAKE_C_COMPILER=armclang -DCMAKE_CXX_COMPILER=armclang++ -DBUILD_SHARED_LIBS=OFF $HOME/src/gromacs-2023.1

make -j 48 install

and then how I ran the test suite

. $HOME/local/gromacs-2023.1/arm-23.04.1/2/bin/GMXRC.bash

./gmxtest.pl -nt 1 -ntomp 1 -verbose all

This works just fine with ARM compilers 22.1 or LLVM 16.0.2 and LLVM 16.0.6 (even with -Ofast), so it seems the issue is specific to ARM compilers.

ARM compilers 23.04.1 works just fine if -O1 is used instead of -O2

I tried to identify the root cause, and found that it comes from the BiasState::updateFreeEnergyAndAddSamplesToHistogram(...) subroutine that is defined in

src/gromacs/applied_forces/awh/biasstate.cpp

A temporary workaround is to prepend the definition with

[[clang::optnone]]

Parents Reply Children
No data