I'm (again) facing a very strange problem in my project for ARM Cortex-M4 (STM32F301K8). The project requires some of the functions to be executed from RAM (it's actually a bootloader with encryption and option to self-update, but that doesn't matter here). In my startup code I have a loop that "initializes" blocks of data by copying them from flash to given address in RAM. The most common use of this code is to copy .data section and it works flawlessly, because it brain-dead simple.
In my linker script I have something like that:
/* sub-section: data_array */ . = ALIGN(4); __data_array_start = .; PROVIDE(__data_array_start = __data_array_start); LONG(LOADADDR(.data)); LONG(ADDR(.data)); LONG(ADDR(.data) + SIZEOF(.data)); LONG(LOADADDR(.ram_text)); LONG(ADDR(.ram_text)); LONG(ADDR(.ram_text) + SIZEOF(.ram_text)); . = ALIGN(4); __data_array_end = .; PROVIDE(__data_array_end = __data_array_end); /* end of sub-section: data_array */
Then in my startup code I have this code:
// Initialize sections from data_array (including .data) ldr r4, =__data_array_start ldr r5, =__data_array_end 1: cmp r4, r5 // outer loop - addresses from data_array ittte lo ldrlo r1, [r4], #4 // start of source address ldrlo r2, [r4], #4 // start of destination address ldrlo r3, [r4], #4 // end of destination address bhs 3f 2: cmp r2, r3 // inner loop - section initialization ittt lo ldrlo r0, [r1], #4 strlo r0, [r2], #4 blo 2b b 1b // go back to start
Now the problem I'm facing right now is that _ONE_ single word in RAM is not stored correctly... The problem is very strange, because when I have 0x00000000 in RAM and 0x12345678 is loaded in the register (r0 in my case) after the write I have 0x00005678 in RAM... Somehow only "half" of the data is written and the other half in RAM is not modified. This problem happens in the middle of the block - so it's not a problem of wrong range, all the data before and after that problematic spot are copied correctly. This problem happens in the same address (for example now that is 0x20000148), but from time to time the particular address changes. If I just move the block to some different address, the problem just moves to some different spot within this block. If I take another chip, the problem persists but on a different address.
As I wrote above, this is the second time I'm having this issue. Previously I've seen it on STM32F103 and nothing helped on the first day - copying with words, bytes, half-words, double-words, memcpy(). After I went to sleep without solving the issue, the next morning everything worked flawlessly ever since with absolutely no fix - identical code that didn't work on one day worked perfectly fine on the other day...
One guy suggested me that this may have something to do with the Flash Patch and Breakpoint unit in the core, but when I check it with the debugger I see that it is indeed enabled (0x261 in FP_CTRL register), but all the comparators are disabled (0 in FP_COMPx).
Anyone faced this issue and found a reliable solution? Thanks in advance for any hints!
First of all - this is not a problem of flashing, because the data I see in flash is correct. Second thing - this is definitely a problem with writing, because correct value is read from flash to register r0, and both index registers have correct values. I can write the data "manually" to the RAM address using OpenOCD and it works perfectly fine, so the RAM is working correctly. It's also not related to OpenOCD's loader, because the problem exists after a fresh power-up of the chip without debugger. Clock, crystal, PLL or any other peripheral cannot be related, because the code is executed right after reset and absolutely NOTHING is enabled.
As previously the problem suddenly disappeared and now the same code works perfectly every time and I cannot reproduce the issue anymore... I'd still be glad to find the root cause of the problem, because it seems there's some pattern here...
What I meant earlier, in details:
r1 and r2 starts off being correct. r0 is read fine from register r1 the first N words.
Then suddenly something happens, which by accident sets bit 1 in r1, thus r0 is now read from an address that spans two words; you'll see a 'skewed' value, but the block being written is "perfectly contigouus". Bit 1 in r1 stays set for a while, then it might get cleared (perhaps when the code is done executing, due to the chip is cooling down).
This could happen if the chip had too much heat during soldering, but it would never happen during debugging, because you'd give the CPU time to cool off.
Thus after the copy is done, both r1 and r2 would look fine.
You could try running the chip in raised room temperature (for instance placing the board under a hot lamp) and see if it starts acting funny.
Making a small "heater" before the copying might help triggering the error:
movs.n r1,#(1 << 24)
loop: subs.n r1,r1,#1
bpl.n loop
nop
(note: the nop instruction is to keep the alignment the same as you had previously, as different alignment can cause different execution timing).
If it the error shows up now, you could change the registers to be r4, r5 and r6; just to see if it still happens.
(If the problem goes away, try changing back to r0, r1 and r2).
Note: The chip could also have been damaged by ESD if it at some point had not been handled correctly; but it's not likely that two chips have the exact same symptoms due to ESD; it would be more likely that it had too much heat during soldering.