This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problem in copying functions to RAM on ARM Cortex-M

I'm (again) facing a very strange problem in my project for ARM Cortex-M4 (STM32F301K8). The project requires some of the functions to be executed from RAM (it's actually a bootloader with encryption and option to self-update, but that doesn't matter here). In my startup code I have a loop that "initializes" blocks of data by copying them from flash to given address in RAM. The most common use of this code is to copy .data section and it works flawlessly, because it brain-dead simple.

In my linker script I have something like that:

          /* sub-section: data_array */

          . = ALIGN(4);
          __data_array_start = .;
          PROVIDE(__data_array_start = __data_array_start);

          LONG(LOADADDR(.data)); LONG(ADDR(.data)); LONG(ADDR(.data) + SIZEOF(.data));
          LONG(LOADADDR(.ram_text)); LONG(ADDR(.ram_text)); LONG(ADDR(.ram_text) + SIZEOF(.ram_text));

          . = ALIGN(4);
          __data_array_end = .;
          PROVIDE(__data_array_end = __data_array_end);

          /* end of sub-section: data_array */

Then in my startup code I have this code:

     // Initialize sections from data_array (including .data)
     ldr          r4, =__data_array_start
     ldr          r5, =__data_array_end

1:     cmp          r4, r5                                   // outer loop - addresses from data_array
     ittte     lo
     ldrlo     r1, [r4], #4                         // start of source address
     ldrlo     r2, [r4], #4                         // start of destination address
     ldrlo     r3, [r4], #4                         // end of destination address
     bhs          3f

2:     cmp          r2, r3                                   // inner loop - section initialization
     ittt     lo
     ldrlo     r0, [r1], #4
     strlo     r0, [r2], #4
     blo          2b

     b          1b                                        // go back to start

Now the problem I'm facing right now is that _ONE_ single word in RAM is not stored correctly... The problem is very strange, because when I have 0x00000000 in RAM and 0x12345678 is loaded in the register (r0 in my case) after the write I have 0x00005678 in RAM... Somehow only "half" of the data is written and the other half in RAM is not modified. This problem happens in the middle of the block - so it's not a problem of wrong range, all the data before and after that problematic spot are copied correctly. This problem happens in the same address (for example now that is 0x20000148), but from time to time the particular address changes. If I just move the block to some different address, the problem just moves to some different spot within this block. If I take another chip, the problem persists but on a different address.

As I wrote above, this is the second time I'm having this issue. Previously I've seen it on STM32F103 and nothing helped on the first day - copying with words, bytes, half-words, double-words, memcpy(). After I went to sleep without solving the issue, the next morning everything worked flawlessly ever since with absolutely no fix - identical code that didn't work on one day worked perfectly fine on the other day...

One guy suggested me that this may have something to do with the Flash Patch and Breakpoint unit in the core, but when I check it with the debugger I see that it is indeed enabled (0x261 in FP_CTRL register), but all the comparators are disabled (0 in FP_COMPx).

Anyone faced this issue and found a reliable solution? Thanks in advance for any hints!

Parents
  • What I meant earlier, in details:

    r1 and r2 starts off being correct. r0 is read fine from register r1 the first N words.

    Then suddenly something happens, which by accident sets bit 1 in r1, thus r0 is now read from an address that spans two words; you'll see a 'skewed' value, but the block being written is "perfectly contigouus". Bit 1 in r1 stays set for a while, then it might get cleared (perhaps when the code is done executing, due to the chip is cooling down).

    This could happen if the chip had too much heat during soldering, but it would never happen during debugging, because you'd give the CPU time to cool off.

    Thus after the copy is done, both r1 and r2 would look fine.

    You could try running the chip in raised room temperature (for instance placing the board under a hot lamp) and see if it starts acting funny.

    Making a small "heater" before the copying might help triggering the error:

    movs.n r1,#(1 << 24)

    loop: subs.n r1,r1,#1

    bpl.n loop

    nop

    (note: the nop instruction is to keep the alignment the same as you had previously, as different alignment can cause different execution timing).

    If it the error shows up now, you could change the registers to be r4, r5 and r6; just to see if it still happens.

    (If the problem goes away, try changing back to r0, r1 and r2).

    Note: The chip could also have been damaged by ESD if it at some point had not been handled correctly; but it's not likely that two chips have the exact same symptoms due to ESD; it would be more likely that it had too much heat during soldering.

Reply
  • What I meant earlier, in details:

    r1 and r2 starts off being correct. r0 is read fine from register r1 the first N words.

    Then suddenly something happens, which by accident sets bit 1 in r1, thus r0 is now read from an address that spans two words; you'll see a 'skewed' value, but the block being written is "perfectly contigouus". Bit 1 in r1 stays set for a while, then it might get cleared (perhaps when the code is done executing, due to the chip is cooling down).

    This could happen if the chip had too much heat during soldering, but it would never happen during debugging, because you'd give the CPU time to cool off.

    Thus after the copy is done, both r1 and r2 would look fine.

    You could try running the chip in raised room temperature (for instance placing the board under a hot lamp) and see if it starts acting funny.

    Making a small "heater" before the copying might help triggering the error:

    movs.n r1,#(1 << 24)

    loop: subs.n r1,r1,#1

    bpl.n loop

    nop

    (note: the nop instruction is to keep the alignment the same as you had previously, as different alignment can cause different execution timing).

    If it the error shows up now, you could change the registers to be r4, r5 and r6; just to see if it still happens.

    (If the problem goes away, try changing back to r0, r1 and r2).

    Note: The chip could also have been damaged by ESD if it at some point had not been handled correctly; but it's not likely that two chips have the exact same symptoms due to ESD; it would be more likely that it had too much heat during soldering.

Children
No data