This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hard fault at misaligned memcpy memset

we made some simple tests with STM32F100 Value Line Eval Board:

//------------------------------------------------------------------------------
// Variables
static unsigned char sDstBuf[1024]; // 1KiB
static unsigned char sSrcBuf[sizeof(sDstBuf)];

printf("Copying words from misaligned src to aligned dst buffer... ");
memset(sDstBuf, 0xcd, sizeof(sDstBuf));

with optimize Level 3, optimize for time this takes
120usec

with optimize Level 0
155usec

almost the same if memcpy is used:
memcpy(sDstBuf, (const void *)0xcd, sizeof(sDstBuf));

It runs into hard fault, if optimize Level >=1 and optimise for time is not set.

I think this is a compiler error..

We ran into this before with MDK 4.60, now we use 4.70A

Werner

  • Sorry, there is more to it, it is not memset / memcpy, I have not understood the code correctly:

    the offending code is

        for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
        pSrcWord = (unsigned int*) (sSrcBuf + 1);
            // Misaligned!
            pSrcWord
                < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
            pSrcWord++)
        {
            *pDstWord = *pSrcWord;
        }
    

    optimize >= 1 for size:

        for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
        pSrcWord = (unsigned int*) (sSrcBuf + 1);
            // Misaligned!
            pSrcWord
                < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
            pSrcWord++)
        {
            *pDstWord = *pSrcWord;
        }
    
    leads to this disassembly part:
    0x08002446 CC02      LDM      r4!,{r1} ; >>>> after this: Hardfault occurs
    0x08002448 6001      STR      r1,[r0,#0x00]
       372:         pSrcWord
       373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
       374:         pSrcWord++)
       375:     {
       376:         *pDstWord = *pSrcWord;
       377:     }
    0x0800244A 42B4      CMP      r4,r6
    0x0800244C D3FB      BCC      0x08002446
    
    
    optimize 0 does this:
       370:     pSrcWord = (unsigned int*) (sSrcBuf + 1);
       371:         // Misaligned!
       372:         pSrcWord
       373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
       374:         pSrcWord++)
       375:     {
    0x080027FE 4C39      LDR      r4,[pc,#228]  ; @0x080028E4
    0x08002800 1C64      ADDS     r4,r4,#1
    0x08002802 E002      B        0x0800280A
       376:         *pDstWord = *pSrcWord;
       377:     }
    0x08002804 6820      LDR      r0,[r4,#0x00]
    0x08002806 6038      STR      r0,[r7,#0x00]
       374:         pSrcWord++)
       375:     {
       376:         *pDstWord = *pSrcWord;
       377:     }
    0x08002808 1D24      ADDS     r4,r4,#4
       372:         pSrcWord
       373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
       374:         pSrcWord++)
       375:     {
       376:         *pDstWord = *pSrcWord;
       377:     }
    0x0800280A 4847      LDR      r0,[pc,#284]  ; @0x08002928
    0x0800280C 42A0      CMP      r0,r4
    0x0800280E D8F9      BHI      0x08002804
    

    Offending command:
    LDM r4!,{r1} ; >>>> after this: Hardfault occurs

  • I think you need to have a look at the user manual of your chip to understand how LDR interacts with unaligned addresses. Many ARM chips differ in that sense.

  • More correctly, you need to have a look at the assembly manual of your toolchain (using ARM compiler...?).

  • it's the assembly that is being produced and shown by uVision debugger
    ARM MDK 4.70A
    ARMCC.EXE V5.03.0.24

    First post mentiones this.
    Optimise for speed or optimise Level 0 runs without problems.

    Werner

  • But what does the manual say about LDR's behavior under such conditions?

  • I do not write Assembly, and I do not know much about it. The Assembly code is produced by the C-Source -> compiled. (mentioned in my second post)

    with optimise >=1 it produces the first (offending code)
    with optimise 0 the second code is produced (which works fine)

    Werner

  • Look, it does not matter that you don't work with assembly directly. You need to understand what's wrong, and the answer is right under your nose. It is up to you to decide whether to burn the 250 calories finding out...

  • with optimise >=1 it produces the first (offending code)
    with optimise 0 the second code is produced (which works fine)

    Optimisation very often breaks flawed code.

    the offending code is

        for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
        pSrcWord = (unsigned int*) (sSrcBuf + 1);
            // Misaligned!
            pSrcWord
                < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
            pSrcWord++)
        {
            *pDstWord = *pSrcWord;
        }
    

    How are you sure that those casts don't end up giving you unaligned addresses...?

  •     for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
        pSrcWord = (unsigned int*) (sSrcBuf + 1);
            // Misaligned!
            pSrcWord
                < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
            pSrcWord++)
        {
            *pDstWord = *pSrcWord;
        }
    

  • I just think the compiler should not produce code that leads to a hard fault.

    Sorry mate, but a statement like that sends a shiver up my spine.

  • Cast a 1-Byte aligned pointer to a 4-Bytes aligned pointer
    would confuse the compiler.

    For 1-Byte aligned pointer -> LDR
    For 4-Bytes aligned pointer with higher optimization -> LDM

  • You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).

    ARM7 has the principal possibility to support access at 2-Byte addresses for LDR and STR commands - but it is quite stupid, as it is not faster than two 4-Byte (=32-bit aligned) accesses. So you should switch this off in the compiler. (if you want to use it, you have to switch it on in the CPU - see the "system ... .c" file - best search for the keyword "aligned" in the ARM7 TRM / STM32F4 Programming Manual / Cortex M4 TRM).

  • Thank you all for your insights and warnings. I will inform if I learn something from Keil support.

    Werner

  • Note that some memory controllers can hide unaligned access - they just force the core to wait extra wait states while the memory controller performs multiple memory accesses and then glues together the partial reads.

    I hope no chip gets a memory controller that performs such unaligned hiding for any peripherial device, or really bad things can happen - for peripherials, it isn't always safe to do an extra read. And an unaligned memory accesses can also trig special hardware logic for the neighbor word - potentially saying that an UART status register have been read and is now "cleared".

    In almost all situations, code should make sure zero unaligned accesses are performed - the main exception is when storing a big array of "data records" where a significant amount of memory can be saved by packing the data.

  • >>
    You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).
    <<

    No; that's not what --no_unaligned_access means.

    When you use --no_unaligned_access it tell armcc that it must not access unaligned data with LDR/STR (and so the processor can be set to disallow unaligned access). This mean that other, less-efficient code sequences will be used to access unaligned data. Accessing data that is guaranteed to be aligned, like (int *), will still use LDR/STR (or even LDM/STM).

    Using --no_unaligned_access does *not* allow you to cast aligned values to (int *). Doing that is *undefined behavior* and the compiler can cause anything to happen that it wants, up to and including, but limited to, causing you to waste a lot of effort tracking down the problem in the hope that you'll learn never to lie to the compiler again.