we made some simple tests with STM32F100 Value Line Eval Board:
//------------------------------------------------------------------------------ // Variables static unsigned char sDstBuf[1024]; // 1KiB static unsigned char sSrcBuf[sizeof(sDstBuf)];
printf("Copying words from misaligned src to aligned dst buffer... "); memset(sDstBuf, 0xcd, sizeof(sDstBuf));
with optimize Level 3, optimize for time this takes 120usec
with optimize Level 0 155usec
almost the same if memcpy is used: memcpy(sDstBuf, (const void *)0xcd, sizeof(sDstBuf));
It runs into hard fault, if optimize Level >=1 and optimise for time is not set.
I think this is a compiler error..
We ran into this before with MDK 4.60, now we use 4.70A
Werner
Sorry, there is more to it, it is not memset / memcpy, I have not understood the code correctly:
the offending code is
for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned! pSrcWord = (unsigned int*) (sSrcBuf + 1); // Misaligned! pSrcWord < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); pSrcWord++) { *pDstWord = *pSrcWord; }
optimize >= 1 for size:
for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned! pSrcWord = (unsigned int*) (sSrcBuf + 1); // Misaligned! pSrcWord < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); pSrcWord++) { *pDstWord = *pSrcWord; } leads to this disassembly part: 0x08002446 CC02 LDM r4!,{r1} ; >>>> after this: Hardfault occurs 0x08002448 6001 STR r1,[r0,#0x00] 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x0800244A 42B4 CMP r4,r6 0x0800244C D3FB BCC 0x08002446 optimize 0 does this: 370: pSrcWord = (unsigned int*) (sSrcBuf + 1); 371: // Misaligned! 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 0x080027FE 4C39 LDR r4,[pc,#228] ; @0x080028E4 0x08002800 1C64 ADDS r4,r4,#1 0x08002802 E002 B 0x0800280A 376: *pDstWord = *pSrcWord; 377: } 0x08002804 6820 LDR r0,[r4,#0x00] 0x08002806 6038 STR r0,[r7,#0x00] 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x08002808 1D24 ADDS r4,r4,#4 372: pSrcWord 373: < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); 374: pSrcWord++) 375: { 376: *pDstWord = *pSrcWord; 377: } 0x0800280A 4847 LDR r0,[pc,#284] ; @0x08002928 0x0800280C 42A0 CMP r0,r4 0x0800280E D8F9 BHI 0x08002804
Offending command: LDM r4!,{r1} ; >>>> after this: Hardfault occurs
I think you need to have a look at the user manual of your chip to understand how LDR interacts with unaligned addresses. Many ARM chips differ in that sense.
More correctly, you need to have a look at the assembly manual of your toolchain (using ARM compiler...?).
it's the assembly that is being produced and shown by uVision debugger ARM MDK 4.70A ARMCC.EXE V5.03.0.24
First post mentiones this. Optimise for speed or optimise Level 0 runs without problems.
But what does the manual say about LDR's behavior under such conditions?
I do not write Assembly, and I do not know much about it. The Assembly code is produced by the C-Source -> compiled. (mentioned in my second post)
with optimise >=1 it produces the first (offending code) with optimise 0 the second code is produced (which works fine)
Look, it does not matter that you don't work with assembly directly. You need to understand what's wrong, and the answer is right under your nose. It is up to you to decide whether to burn the 250 calories finding out...
Optimisation very often breaks flawed code.
How are you sure that those casts don't end up giving you unaligned addresses...?
I just think the compiler should not produce code that leads to a hard fault.
Sorry mate, but a statement like that sends a shiver up my spine.
Cast a 1-Byte aligned pointer to a 4-Bytes aligned pointer would confuse the compiler.
For 1-Byte aligned pointer -> LDR For 4-Bytes aligned pointer with higher optimization -> LDM
You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).
ARM7 has the principal possibility to support access at 2-Byte addresses for LDR and STR commands - but it is quite stupid, as it is not faster than two 4-Byte (=32-bit aligned) accesses. So you should switch this off in the compiler. (if you want to use it, you have to switch it on in the CPU - see the "system ... .c" file - best search for the keyword "aligned" in the ARM7 TRM / STM32F4 Programming Manual / Cortex M4 TRM).
Thank you all for your insights and warnings. I will inform if I learn something from Keil support.
Note that some memory controllers can hide unaligned access - they just force the core to wait extra wait states while the memory controller performs multiple memory accesses and then glues together the partial reads.
I hope no chip gets a memory controller that performs such unaligned hiding for any peripherial device, or really bad things can happen - for peripherials, it isn't always safe to do an extra read. And an unaligned memory accesses can also trig special hardware logic for the neighbor word - potentially saying that an UART status register have been read and is now "cleared".
In almost all situations, code should make sure zero unaligned accesses are performed - the main exception is when storing a big array of "data records" where a significant amount of memory can be saved by packing the data.
>> You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...). <<
No; that's not what --no_unaligned_access means.
When you use --no_unaligned_access it tell armcc that it must not access unaligned data with LDR/STR (and so the processor can be set to disallow unaligned access). This mean that other, less-efficient code sequences will be used to access unaligned data. Accessing data that is guaranteed to be aligned, like (int *), will still use LDR/STR (or even LDM/STM).
Using --no_unaligned_access does *not* allow you to cast aligned values to (int *). Doing that is *undefined behavior* and the compiler can cause anything to happen that it wants, up to and including, but limited to, causing you to waste a lot of effort tracking down the problem in the hope that you'll learn never to lie to the compiler again.