I've been using the ARM GCC release aarch64-none-elf-gcc-11.2.1 in a baremetal project for some time in a large project that has successfully used libc functions (malloc/memcpy) many times without issue using these options -L
$AARCH64_GCC_PATH/aarch64-none-elf/lib -lc -lnosys -lg
I recently saw an exception due to an unaligned access during memcpy despite compiling with -mstrict-align.
After isolating the issue and creating a unit test I believe I've found a bug, please ignore the addresses from the objdump and memcpy call, just made them up for this test.
When performing a memcpy on device type memory where size = 0x8 + 0x4*n where n is any natural number.
An exception will be thrown as even though care may be taken to have src/dst pointers aligned, the instruction seen on 6009c from the below objdump of memcpy on aarch64 leads to ldur x7, [x4, #-8]. Which in the case of a size 0xc copy would do an LDUR of a 32bit aligned address ending in 0x4 to a 64 bit x* register, which results in a Data Abort.
//unit test #include <stdlib.h> #include <string.h> volatile int bssTest; void swap(int a, int b) { memcpy((void*)0x500,(void*)0x1000,0xc); }
0000000000060040 <memcpy>: 60040: f9800020 prfm pldl1keep, [x1] 60044: 8b020024 add x4, x1, x2 60048: 8b020005 add x5, x0, x2 6004c: f100405f cmp x2, #0x10 60050: 54000209 b.ls 60090 <memcpy+0x50> // b.plast 60054: f101805f cmp x2, #0x60 60058: 54000648 b.hi 60120 <memcpy+0xe0> // b.pmore 6005c: d1000449 sub x9, x2, #0x1 60060: a9401c26 ldp x6, x7, [x1] 60064: 37300469 tbnz w9, #6, 600f0 <memcpy+0xb0> 60068: a97f348c ldp x12, x13, [x4, #-16] 6006c: 362800a9 tbz w9, #5, 60080 <memcpy+0x40> 60070: a9412428 ldp x8, x9, [x1, #16] 60074: a97e2c8a ldp x10, x11, [x4, #-32] 60078: a9012408 stp x8, x9, [x0, #16] 6007c: a93e2caa stp x10, x11, [x5, #-32] 60080: a9001c06 stp x6, x7, [x0] 60084: a93f34ac stp x12, x13, [x5, #-16] 60088: d65f03c0 ret 6008c: d503201f nop 60090: f100205f cmp x2, #0x8 60094: 540000e3 b.cc 600b0 <memcpy+0x70> // b.lo, b.ul, b.last 60098: f9400026 ldr x6, [x1] 6009c: f85f8087 ldur x7, [x4, #-8] 600a0: f9000006 str x6, [x0] 600a4: f81f80a7 stur x7, [x5, #-8] 600a8: d65f03c0 ret 600ac: d503201f nop 600b0: 361000c2 tbz w2, #2, 600c8 <memcpy+0x88> 600b4: b9400026 ldr w6, [x1] 600b8: b85fc087 ldur w7, [x4, #-4] 600bc: b9000006 str w6, [x0] 600c0: b81fc0a7 stur w7, [x5, #-4] 600c4: d65f03c0 ret 600c8: b4000102 cbz x2, 600e8 <memcpy+0xa8> 600cc: d341fc49 lsr x9, x2, #1 600d0: 39400026 ldrb w6, [x1] 600d4: 385ff087 ldurb w7, [x4, #-1] 600d8: 38696828 ldrb w8, [x1, x9] 600dc: 39000006 strb w6, [x0] 600e0: 38296808 strb w8, [x0, x9] 600e4: 381ff0a7 sturb w7, [x5, #-1] 600e8: d65f03c0 ret 600ec: d503201f nop 600f0: a9412428 ldp x8, x9, [x1, #16] 600f4: a9422c2a ldp x10, x11, [x1, #32] 600f8: a943342c ldp x12, x13, [x1, #48] 600fc: a97e0881 ldp x1, x2, [x4, #-32] 60100: a97f0c84 ldp x4, x3, [x4, #-16] 60104: a9001c06 stp x6, x7, [x0] 60108: a9012408 stp x8, x9, [x0, #16] 6010c: a9022c0a stp x10, x11, [x0, #32] 60110: a903340c stp x12, x13, [x0, #48] 60114: a93e08a1 stp x1, x2, [x5, #-32] 60118: a93f0ca4 stp x4, x3, [x5, #-16] 6011c: d65f03c0 ret 60120: 92400c09 and x9, x0, #0xf 60124: 927cec03 and x3, x0, #0xfffffffffffffff0 60128: a940342c ldp x12, x13, [x1] 6012c: cb090021 sub x1, x1, x9 60130: 8b090042 add x2, x2, x9 60134: a9411c26 ldp x6, x7, [x1, #16] 60138: a900340c stp x12, x13, [x0] 6013c: a9422428 ldp x8, x9, [x1, #32] 60140: a9432c2a ldp x10, x11, [x1, #48] 60144: a9c4342c ldp x12, x13, [x1, #64]! 60148: f1024042 subs x2, x2, #0x90 6014c: 54000169 b.ls 60178 <memcpy+0x138> // b.plast 60150: a9011c66 stp x6, x7, [x3, #16] 60154: a9411c26 ldp x6, x7, [x1, #16] 60158: a9022468 stp x8, x9, [x3, #32] 6015c: a9422428 ldp x8, x9, [x1, #32] 60160: a9032c6a stp x10, x11, [x3, #48] 60164: a9432c2a ldp x10, x11, [x1, #48] 60168: a984346c stp x12, x13, [x3, #64]! 6016c: a9c4342c ldp x12, x13, [x1, #64]! 60170: f1010042 subs x2, x2, #0x40 60174: 54fffee8 b.hi 60150 <memcpy+0x110> // b.pmore 60178: a97c0881 ldp x1, x2, [x4, #-64] 6017c: a9011c66 stp x6, x7, [x3, #16] 60180: a97d1c86 ldp x6, x7, [x4, #-48] 60184: a9022468 stp x8, x9, [x3, #32] 60188: a97e2488 ldp x8, x9, [x4, #-32] 6018c: a9032c6a stp x10, x11, [x3, #48] 60190: a97f2c8a ldp x10, x11, [x4, #-16] 60194: a904346c stp x12, x13, [x3, #64] 60198: a93c08a1 stp x1, x2, [x5, #-64] 6019c: a93d1ca6 stp x6, x7, [x5, #-48] 601a0: a93e24a8 stp x8, x9, [x5, #-32] 601a4: a93f2caa stp x10, x11, [x5, #-16] 601a8: d65f03c0 ret 601ac: 00000000 udf #0
While I understand that care must be taken when using stdlib functions in a baremetal application, due to the nature of our codebase it would be very difficult to ensure that every call to memcpy has a size that is 64bit aligned. Shouldn't newlib/compiler take care to ensure that memcpy will use 32bit w registers for any 32bit aligned memcpy anyway? Especially with -mstrict-align?
What are my options as far as providing an immediate fix in the meantime, I suppose I could try to override the definition of memcpy but what source should I base the replacement implementation on in that case.
Any help on this is appreciated, thanks.