we made some simple tests with STM32F100 Value Line Eval Board:
//------------------------------------------------------------------------------ // Variables static unsigned char sDstBuf[1024]; // 1KiB static unsigned char sSrcBuf[sizeof(sDstBuf)];
printf("Copying words from misaligned src to aligned dst buffer... "); memset(sDstBuf, 0xcd, sizeof(sDstBuf));
with optimize Level 3, optimize for time this takes 120usec
with optimize Level 0 155usec
almost the same if memcpy is used: memcpy(sDstBuf, (const void *)0xcd, sizeof(sDstBuf));
It runs into hard fault, if optimize Level >=1 and optimise for time is not set.
I think this is a compiler error..
We ran into this before with MDK 4.60, now we use 4.70A
Werner
Sorry, there is more to it, it is not memset / memcpy, I have not understood the code correctly:
the offending code is
for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned! pSrcWord = (unsigned int*) (sSrcBuf + 1); // Misaligned! pSrcWord < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); pSrcWord++) { *pDstWord = *pSrcWord; }
optimize >= 1 for size:
for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned! pSrcWord = (unsigned int*) (sSrcBuf + 1); // Misaligned! pSrcWord < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); pSrcWord++) { *pDstWord = *pSrcWord; } leads to this disassembly part: 0x08002446 CC02 LDM r4!,{r1} ; >>>> after this: Hardfault occurs 0x08002448 6001 STR r1,[r0,#0x00] 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x0800244A 42B4 CMP r4,r6 0x0800244C D3FB BCC 0x08002446 optimize 0 does this: 370: pSrcWord = (unsigned int*) (sSrcBuf + 1); 371: // Misaligned! 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 0x080027FE 4C39 LDR r4,[pc,#228] ; @0x080028E4 0x08002800 1C64 ADDS r4,r4,#1 0x08002802 E002 B 0x0800280A 376: *pDstWord = *pSrcWord; 377: } 0x08002804 6820 LDR r0,[r4,#0x00] 0x08002806 6038 STR r0,[r7,#0x00] 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x08002808 1D24 ADDS r4,r4,#4 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x0800280A 4847 LDR r0,[pc,#284] ; @0x08002928 0x0800280C 42A0 CMP r0,r4 0x0800280E D8F9 BHI 0x08002804
Offending command: LDM r4!,{r1} ; >>>> after this: Hardfault occurs
I think you need to have a look at the user manual of your chip to understand how LDR interacts with unaligned addresses. Many ARM chips differ in that sense.
More correctly, you need to have a look at the assembly manual of your toolchain (using ARM compiler...?).
it's the assembly that is being produced and shown by uVision debugger ARM MDK 4.70A ARMCC.EXE V5.03.0.24
First post mentiones this. Optimise for speed or optimise Level 0 runs without problems.
But what does the manual say about LDR's behavior under such conditions?
I do not write Assembly, and I do not know much about it. The Assembly code is produced by the C-Source -> compiled. (mentioned in my second post)
with optimise >=1 it produces the first (offending code) with optimise 0 the second code is produced (which works fine)
Look, it does not matter that you don't work with assembly directly. You need to understand what's wrong, and the answer is right under your nose. It is up to you to decide whether to burn the 250 calories finding out...
Optimisation very often breaks flawed code.
How are you sure that those casts don't end up giving you unaligned addresses...?
You don't need to be so stupid like that.