This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hard fault at misaligned memcpy memset

we made some simple tests with STM32F100 Value Line Eval Board:

//------------------------------------------------------------------------------
// Variables
static unsigned char sDstBuf[1024]; // 1KiB
static unsigned char sSrcBuf[sizeof(sDstBuf)];

printf("Copying words from misaligned src to aligned dst buffer... ");
memset(sDstBuf, 0xcd, sizeof(sDstBuf));

with optimize Level 3, optimize for time this takes
120usec

with optimize Level 0
155usec

almost the same if memcpy is used:
memcpy(sDstBuf, (const void *)0xcd, sizeof(sDstBuf));

It runs into hard fault, if optimize Level >=1 and optimise for time is not set.

I think this is a compiler error..

We ran into this before with MDK 4.60, now we use 4.70A

Werner

0 Werner Meier over 11 years ago

Sorry, there is more to it, it is not memset / memcpy, I have not understood the code correctly:

the offending code is

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

optimize >= 1 for size:

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

leads to this disassembly part:
0x08002446 CC02      LDM      r4!,{r1} ; >>>> after this: Hardfault occurs
0x08002448 6001      STR      r1,[r0,#0x00]
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x0800244A 42B4      CMP      r4,r6
0x0800244C D3FB      BCC      0x08002446


optimize 0 does this:
   370:     pSrcWord = (unsigned int*) (sSrcBuf + 1);
   371:         // Misaligned!
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
0x080027FE 4C39      LDR      r4,[pc,#228]  ; @0x080028E4
0x08002800 1C64      ADDS     r4,r4,#1
0x08002802 E002      B        0x0800280A
   376:         *pDstWord = *pSrcWord;
   377:     }
0x08002804 6820      LDR      r0,[r4,#0x00]
0x08002806 6038      STR      r0,[r7,#0x00]
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x08002808 1D24      ADDS     r4,r4,#4
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x0800280A 4847      LDR      r0,[pc,#284]  ; @0x08002928
0x0800280C 42A0      CMP      r0,r4
0x0800280E D8F9      BHI      0x08002804

Offending command:
LDM r4!,{r1} ; >>>> after this: Hardfault occurs

0 Tamiryan Michael over 11 years ago in reply to Werner Meier

I think you need to have a look at the user manual of your chip to understand how LDR interacts with unaligned addresses. Many ARM chips differ in that sense.
Cancel
Up 0 Down

Cancel
0 Tamiryan Michael over 11 years ago in reply to Tamiryan Michael

More correctly, you need to have a look at the assembly manual of your toolchain (using ARM compiler...?).
Cancel
Up 0 Down

Cancel
0 Werner Meier over 11 years ago in reply to Tamiryan Michael

it's the assembly that is being produced and shown by uVision debugger
ARM MDK 4.70A
ARMCC.EXE V5.03.0.24

First post mentiones this.
Optimise for speed or optimise Level 0 runs without problems.

Werner
Cancel
Up 0 Down

Cancel
0 Tamiryan Michael over 11 years ago in reply to Werner Meier

But what does the manual say about LDR's behavior under such conditions?
Cancel
Up 0 Down

Cancel
0 Werner Meier over 11 years ago in reply to Tamiryan Michael

I do not write Assembly, and I do not know much about it. The Assembly code is produced by the C-Source -> compiled. (mentioned in my second post)

with optimise >=1 it produces the first (offending code)
with optimise 0 the second code is produced (which works fine)

Werner
Cancel
Up 0 Down

Cancel
0 Tamiryan Michael over 11 years ago in reply to Werner Meier

Look, it does not matter that you don't work with assembly directly. You need to understand what's wrong, and the answer is right under your nose. It is up to you to decide whether to burn the 250 calories finding out...
Cancel
Up 0 Down

Cancel
0 Andy Neil over 11 years ago in reply to Werner Meier
with optimise >=1 it produces the first (offending code)
with optimise 0 the second code is produced (which works fine)

Optimisation very often breaks flawed code.

the offending code is

for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned! pSrcWord = (unsigned int*) (sSrcBuf + 1); // Misaligned! pSrcWord < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord)); pSrcWord++) { *pDstWord = *pSrcWord; }

How are you sure that those casts don't end up giving you unaligned addresses...?
Cancel
Up 0 Down

Cancel

0 Andy Neil over 11 years ago in reply to Andy Neil

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

0 software release note over 11 years ago

I just think the compiler should not produce code that leads to a hard fault.

Sorry mate, but a statement like that sends a shiver up my spine.
Cancel
Up 0 Down

Cancel
0 John Linq over 11 years ago

Cast a 1-Byte aligned pointer to a 4-Bytes aligned pointer
would confuse the compiler.

For 1-Byte aligned pointer -> LDR
For 4-Bytes aligned pointer with higher optimization -> LDM
Cancel
Up 0 Down

Cancel
0 nice day over 11 years ago

You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).

ARM7 has the principal possibility to support access at 2-Byte addresses for LDR and STR commands - but it is quite stupid, as it is not faster than two 4-Byte (=32-bit aligned) accesses. So you should switch this off in the compiler. (if you want to use it, you have to switch it on in the CPU - see the "system ... .c" file - best search for the keyword "aligned" in the ARM7 TRM / STM32F4 Programming Manual / Cortex M4 TRM).
Cancel
Up 0 Down

Cancel
0 Werner Meier over 11 years ago

Thank you all for your insights and warnings. I will inform if I learn something from Keil support.

Werner
Cancel
Up 0 Down

Cancel
0 ImPer Westermark over 11 years ago in reply to nice day

Note that some memory controllers can hide unaligned access - they just force the core to wait extra wait states while the memory controller performs multiple memory accesses and then glues together the partial reads.

I hope no chip gets a memory controller that performs such unaligned hiding for any peripherial device, or really bad things can happen - for peripherials, it isn't always safe to do an extra read. And an unaligned memory accesses can also trig special hardware logic for the neighbor word - potentially saying that an UART status register have been read and is now "cleared".

In almost all situations, code should make sure zero unaligned accesses are performed - the main exception is when storing a big array of "data records" where a significant amount of memory can be saved by packing the data.
Cancel
Up 0 Down

Cancel
0 Scott Douglass over 11 years ago in reply to nice day

>>
You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).
<<

No; that's not what --no_unaligned_access means.

When you use --no_unaligned_access it tell armcc that it must not access unaligned data with LDR/STR (and so the processor can be set to disallow unaligned access). This mean that other, less-efficient code sequences will be used to access unaligned data. Accessing data that is guaranteed to be aligned, like (int *), will still use LDR/STR (or even LDM/STM).

Using --no_unaligned_access does *not* allow you to cast aligned values to (int *). Doing that is *undefined behavior* and the compiler can cause anything to happen that it wants, up to and including, but limited to, causing you to waste a lot of effort tracking down the problem in the hope that you'll learn never to lie to the compiler again.
Cancel
Up 0 Down

Cancel