I work on Valgrind. Recently I got a Raspberry Pi 5, and I've been cleaning up the regression tests a bit (both Raspberry Pi OS 32bit arm userland and Ubuntu all aarch64).
One testcase that is failing is called "neon64". This hasn't been updated for about 10 years. At least originally it was compiled with -mfloat-abi=softfp but now it is compiled without any abi options so must be defaulting to the Raspberry Pi OS gnueabihf hard float.
There are 3 differences that all look similar to
#define TESTINSN_VSTn(instruction, QD1, QD2, QD3, QD4) \{ \ unsigned int out[9]; \\ memset(out, 0x55, 8 * (sizeof(unsigned int)));\ __asm__ volatile( \ "mov r4, %1\n\t" \ "vldmia %1!, {" #QD1 "}\n\t" \ "vldmia %1!, {" #QD2 "}\n\t" \ "vldmia %1!, {" #QD3 "}\n\t" \ "vldmia %1!, {" #QD4 "}\n\t" \ "mov %1, r4\n\t" \ instruction ", [%0]\n\t" \ "str %0, [%2]\n\t" \ : \ : "r" (out), "r" (mem), "r"(&out[8]) \ : #QD1, #QD2, #QD3, #QD4, "memory", "r4" \ ); \ fflush(stdout); \ printf("%s :: Result %08x'%08x %08x'%08x " \ "%08x'%08x %08x'%08x delta %d\n", \ instruction, out[1], out[0], out[3], out[2], out[5], \ out[4], out[7], out[6], (int)out[8]-(int)out); \}
TESTINSN_VSTn("vst4.32 {d0[1],d1[1],d2[1],d3[1]}", d0, d1, d2, d4);
which gives a diff
-vst4.32 {d0[1],d1[1],d2[1],d3[1]} :: Result 0f0e0d0c'07060504 1f1e1d1c'17161514 55555555'55555555 55555555'55555555 delta 0 +vst4.32 {d0[1],d1[1],d2[1],d3[1]} :: Result 0f0e0d0c'07060504 55555555'17161514 55555555'55555555 55555555'55555555 delta 0
The Valgrind results look correct. If I run the application natively I get the same result as running it under Valgrind.
So my question is, does it seem plausible that there were some bugs in the soft float emulation that don't exist in the hardware?
Thinking about this some more, it looks to me as though there was a bug in the testcase.
I think that
should be
TESTINSN_VSTn("vst4.32 {d0[1],d1[1],d2[1],d3[1]}", d0, d1, d2, d3);
If I understand the macro correctly it loads the neon registers from mem based on the last 4 macro arguments and then does a deinterleaved vector copy from register to the out local array, and then prints out.
The first 55555555 in theoutput comes from the memset and I think that it should have been overwritten.
I've just pushed a change that fixes the testcase. The reference has reverted to the previous state.