We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I've been using the ARM GCC release aarch64-none-elf-gcc-11.2.1 in a baremetal project for some time in a large project that has successfully used libc functions (malloc/memcpy) many times without issue using these options -L
$AARCH64_GCC_PATH/aarch64-none-elf/lib -lc -lnosys -lg
I recently saw an exception due to an unaligned access during memcpy despite compiling with -mstrict-align.
After isolating the issue and creating a unit test I believe I've found a bug, please ignore the addresses from the objdump and memcpy call, just made them up for this test.
When performing a memcpy on device type memory where size = 0x8 + 0x4*n where n is any natural number.
An exception will be thrown as even though care may be taken to have src/dst pointers aligned, the instruction seen on 6009c from the below objdump of memcpy on aarch64 leads to ldur x7, [x4, #-8]. Which in the case of a size 0xc copy would do an LDUR of a 32bit aligned address ending in 0x4 to a 64 bit x* register, which results in a Data Abort.
//unit test #include <stdlib.h> #include <string.h> volatile int bssTest; void swap(int a, int b) { memcpy((void*)0x500,(void*)0x1000,0xc); }
0000000000060040 <memcpy>: 60040: f9800020 prfm pldl1keep, [x1] 60044: 8b020024 add x4, x1, x2 60048: 8b020005 add x5, x0, x2 6004c: f100405f cmp x2, #0x10 60050: 54000209 b.ls 60090 <memcpy+0x50> // b.plast 60054: f101805f cmp x2, #0x60 60058: 54000648 b.hi 60120 <memcpy+0xe0> // b.pmore 6005c: d1000449 sub x9, x2, #0x1 60060: a9401c26 ldp x6, x7, [x1] 60064: 37300469 tbnz w9, #6, 600f0 <memcpy+0xb0> 60068: a97f348c ldp x12, x13, [x4, #-16] 6006c: 362800a9 tbz w9, #5, 60080 <memcpy+0x40> 60070: a9412428 ldp x8, x9, [x1, #16] 60074: a97e2c8a ldp x10, x11, [x4, #-32] 60078: a9012408 stp x8, x9, [x0, #16] 6007c: a93e2caa stp x10, x11, [x5, #-32] 60080: a9001c06 stp x6, x7, [x0] 60084: a93f34ac stp x12, x13, [x5, #-16] 60088: d65f03c0 ret 6008c: d503201f nop 60090: f100205f cmp x2, #0x8 60094: 540000e3 b.cc 600b0 <memcpy+0x70> // b.lo, b.ul, b.last 60098: f9400026 ldr x6, [x1] 6009c: f85f8087 ldur x7, [x4, #-8] 600a0: f9000006 str x6, [x0] 600a4: f81f80a7 stur x7, [x5, #-8] 600a8: d65f03c0 ret 600ac: d503201f nop 600b0: 361000c2 tbz w2, #2, 600c8 <memcpy+0x88> 600b4: b9400026 ldr w6, [x1] 600b8: b85fc087 ldur w7, [x4, #-4] 600bc: b9000006 str w6, [x0] 600c0: b81fc0a7 stur w7, [x5, #-4] 600c4: d65f03c0 ret 600c8: b4000102 cbz x2, 600e8 <memcpy+0xa8> 600cc: d341fc49 lsr x9, x2, #1 600d0: 39400026 ldrb w6, [x1] 600d4: 385ff087 ldurb w7, [x4, #-1] 600d8: 38696828 ldrb w8, [x1, x9] 600dc: 39000006 strb w6, [x0] 600e0: 38296808 strb w8, [x0, x9] 600e4: 381ff0a7 sturb w7, [x5, #-1] 600e8: d65f03c0 ret 600ec: d503201f nop 600f0: a9412428 ldp x8, x9, [x1, #16] 600f4: a9422c2a ldp x10, x11, [x1, #32] 600f8: a943342c ldp x12, x13, [x1, #48] 600fc: a97e0881 ldp x1, x2, [x4, #-32] 60100: a97f0c84 ldp x4, x3, [x4, #-16] 60104: a9001c06 stp x6, x7, [x0] 60108: a9012408 stp x8, x9, [x0, #16] 6010c: a9022c0a stp x10, x11, [x0, #32] 60110: a903340c stp x12, x13, [x0, #48] 60114: a93e08a1 stp x1, x2, [x5, #-32] 60118: a93f0ca4 stp x4, x3, [x5, #-16] 6011c: d65f03c0 ret 60120: 92400c09 and x9, x0, #0xf 60124: 927cec03 and x3, x0, #0xfffffffffffffff0 60128: a940342c ldp x12, x13, [x1] 6012c: cb090021 sub x1, x1, x9 60130: 8b090042 add x2, x2, x9 60134: a9411c26 ldp x6, x7, [x1, #16] 60138: a900340c stp x12, x13, [x0] 6013c: a9422428 ldp x8, x9, [x1, #32] 60140: a9432c2a ldp x10, x11, [x1, #48] 60144: a9c4342c ldp x12, x13, [x1, #64]! 60148: f1024042 subs x2, x2, #0x90 6014c: 54000169 b.ls 60178 <memcpy+0x138> // b.plast 60150: a9011c66 stp x6, x7, [x3, #16] 60154: a9411c26 ldp x6, x7, [x1, #16] 60158: a9022468 stp x8, x9, [x3, #32] 6015c: a9422428 ldp x8, x9, [x1, #32] 60160: a9032c6a stp x10, x11, [x3, #48] 60164: a9432c2a ldp x10, x11, [x1, #48] 60168: a984346c stp x12, x13, [x3, #64]! 6016c: a9c4342c ldp x12, x13, [x1, #64]! 60170: f1010042 subs x2, x2, #0x40 60174: 54fffee8 b.hi 60150 <memcpy+0x110> // b.pmore 60178: a97c0881 ldp x1, x2, [x4, #-64] 6017c: a9011c66 stp x6, x7, [x3, #16] 60180: a97d1c86 ldp x6, x7, [x4, #-48] 60184: a9022468 stp x8, x9, [x3, #32] 60188: a97e2488 ldp x8, x9, [x4, #-32] 6018c: a9032c6a stp x10, x11, [x3, #48] 60190: a97f2c8a ldp x10, x11, [x4, #-16] 60194: a904346c stp x12, x13, [x3, #64] 60198: a93c08a1 stp x1, x2, [x5, #-64] 6019c: a93d1ca6 stp x6, x7, [x5, #-48] 601a0: a93e24a8 stp x8, x9, [x5, #-32] 601a4: a93f2caa stp x10, x11, [x5, #-16] 601a8: d65f03c0 ret 601ac: 00000000 udf #0
While I understand that care must be taken when using stdlib functions in a baremetal application, due to the nature of our codebase it would be very difficult to ensure that every call to memcpy has a size that is 64bit aligned. Shouldn't newlib/compiler take care to ensure that memcpy will use 32bit w registers for any 32bit aligned memcpy anyway? Especially with -mstrict-align?
What are my options as far as providing an immediate fix in the meantime, I suppose I could try to override the definition of memcpy but what source should I base the replacement implementation on in that case.
Any help on this is appreciated, thanks.