This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Problem in copying functions to RAM on ARM Cortex-M

I'm (again) facing a very strange problem in my project for ARM Cortex-M4 (STM32F301K8). The project requires some of the functions to be executed from RAM (it's actually a bootloader with encryption and option to self-update, but that doesn't matter here). In my startup code I have a loop that "initializes" blocks of data by copying them from flash to given address in RAM. The most common use of this code is to copy .data section and it works flawlessly, because it brain-dead simple.

In my linker script I have something like that:

          /* sub-section: data_array */

          . = ALIGN(4);
          __data_array_start = .;
          PROVIDE(__data_array_start = __data_array_start);

          LONG(LOADADDR(.data)); LONG(ADDR(.data)); LONG(ADDR(.data) + SIZEOF(.data));
          LONG(LOADADDR(.ram_text)); LONG(ADDR(.ram_text)); LONG(ADDR(.ram_text) + SIZEOF(.ram_text));

          . = ALIGN(4);
          __data_array_end = .;
          PROVIDE(__data_array_end = __data_array_end);

          /* end of sub-section: data_array */

Then in my startup code I have this code:

     // Initialize sections from data_array (including .data)
     ldr          r4, =__data_array_start
     ldr          r5, =__data_array_end

1:     cmp          r4, r5                                   // outer loop - addresses from data_array
     ittte     lo
     ldrlo     r1, [r4], #4                         // start of source address
     ldrlo     r2, [r4], #4                         // start of destination address
     ldrlo     r3, [r4], #4                         // end of destination address
     bhs          3f

2:     cmp          r2, r3                                   // inner loop - section initialization
     ittt     lo
     ldrlo     r0, [r1], #4
     strlo     r0, [r2], #4
     blo          2b

     b          1b                                        // go back to start

Now the problem I'm facing right now is that _ONE_ single word in RAM is not stored correctly... The problem is very strange, because when I have 0x00000000 in RAM and 0x12345678 is loaded in the register (r0 in my case) after the write I have 0x00005678 in RAM... Somehow only "half" of the data is written and the other half in RAM is not modified. This problem happens in the middle of the block - so it's not a problem of wrong range, all the data before and after that problematic spot are copied correctly. This problem happens in the same address (for example now that is 0x20000148), but from time to time the particular address changes. If I just move the block to some different address, the problem just moves to some different spot within this block. If I take another chip, the problem persists but on a different address.

As I wrote above, this is the second time I'm having this issue. Previously I've seen it on STM32F103 and nothing helped on the first day - copying with words, bytes, half-words, double-words, memcpy(). After I went to sleep without solving the issue, the next morning everything worked flawlessly ever since with absolutely no fix - identical code that didn't work on one day worked perfectly fine on the other day...

One guy suggested me that this may have something to do with the Flash Patch and Breakpoint unit in the core, but when I check it with the debugger I see that it is indeed enabled (0x261 in FP_CTRL register), but all the comparators are disabled (0 in FP_COMPx).

Anyone faced this issue and found a reliable solution? Thanks in advance for any hints!

0 Jens Bauer over 11 years ago

I had a similar problem with my LPC1342, so I think it might not be unheard of. Similar but not identical; I think it was a problem with flashing the chip. Very few data (at random) went haywire.
I remember that if I ran the microcontroller at a low speed, the problem went away, but as soon as I ran it at full speed, it went erratic (eg. Even if I flash-programmed it at a low speed).
Looking at the code and comparing with the symptoms, it suggest that it's the *read* that goes wrong, not the write.
Eg. somehow, it sounds like the source-pointer might 'jump' back or forward by 2.
This could be caused by running the microcontroller at some high (overclocked) speed by accident.
-So first thing: Try using OpenOCD and issue a few mdw commands to dump the RCC registers (something like 'mdw 0x40022000 40' will probably do fine), and then use the Reference Manual to find out what speed the MCU is actually running at. This is a much better approach than reading code, because you can look at the code over and over and never see the error.
I think the first thing you might need to do is to check that the chip gets the power it needs.
If it's a Discovery-board, then it probably does already, but if it's your own design, it's important to remember that something could have gone wrong (also from the PCB manufacturer's side).
What I'm going to suggest is of course trivial (and probably a little annoying).
Check that each of your 100nF VDD capacitors are soldered correctly.
Check that there's a stable voltage on those pins.
Now a bit worse: Make sure your external clock crystal's capacitors are correct.
This may require some advanced equipment; if you have the equipment, then it's cool.
If you don't, then the best bet will be to verify that there's no open connections between the XTAL pins and the crystal's terminals, plus that the capacitors are soldered correctly.
Also the value of the capacitors would most likely be in the range 6pF to 10pF.
If they're for instance 22pF, I'm pretty sure you'll need to re-calculate the values.
-But instead of checking the crystal and capacitors, it might be a lot quicker to switch to using the internal oscillator, run at a low frequency and see if the problem persists.
Please let me know about your findings.
Cancel
Vote up 0 Vote down

Cancel
0 Jens Bauer over 11 years ago

I just remembered that I was running my LPC1343 from the internal oscillator, thus my issue was not due to a problem with the external components/circuit.
However, since I used OpenOCD for flash-programming the chip, and OpenOCD uploads a small program into the SRAM of the chip, it actually makes a block-copy, I don't know if that could have caused the problem in my case.
Cancel
Vote up 0 Vote down

Cancel
0 Freddie Chopin over 11 years ago in reply to Jens Bauer

First of all - this is not a problem of flashing, because the data I see in flash is correct. Second thing - this is definitely a problem with writing, because correct value is read from flash to register r0, and both index registers have correct values. I can write the data "manually" to the RAM address using OpenOCD and it works perfectly fine, so the RAM is working correctly. It's also not related to OpenOCD's loader, because the problem exists after a fresh power-up of the chip without debugger. Clock, crystal, PLL or any other peripheral cannot be related, because the code is executed right after reset and absolutely NOTHING is enabled.
As previously the problem suddenly disappeared and now the same code works perfectly every time and I cannot reproduce the issue anymore... I'd still be glad to find the root cause of the problem, because it seems there's some pattern here...
Cancel
Vote up 0 Vote down

Cancel
0 Jens Bauer over 11 years ago in reply to Freddie Chopin

What I meant earlier, in details:
r1 and r2 starts off being correct. r0 is read fine from register r1 the first N words.
Then suddenly something happens, which by accident sets bit 1 in r1, thus r0 is now read from an address that spans two words; you'll see a 'skewed' value, but the block being written is "perfectly contigouus". Bit 1 in r1 stays set for a while, then it might get cleared (perhaps when the code is done executing, due to the chip is cooling down).
This could happen if the chip had too much heat during soldering, but it would never happen during debugging, because you'd give the CPU time to cool off.
Thus after the copy is done, both r1 and r2 would look fine.
You could try running the chip in raised room temperature (for instance placing the board under a hot lamp) and see if it starts acting funny.
Making a small "heater" before the copying might help triggering the error:
movs.n r1,#(1 << 24)
loop: subs.n r1,r1,#1
bpl.n loop
nop
(note: the nop instruction is to keep the alignment the same as you had previously, as different alignment can cause different execution timing).
If it the error shows up now, you could change the registers to be r4, r5 and r6; just to see if it still happens.
(If the problem goes away, try changing back to r0, r1 and r2).
Note: The chip could also have been damaged by ESD if it at some point had not been handled correctly; but it's not likely that two chips have the exact same symptoms due to ESD; it would be more likely that it had too much heat during soldering.
Cancel
Vote up 0 Vote down

Cancel
0 Mikey over 10 years ago

Hi guys,
I am having exactly the same issue with STM32F429 and STM32F427. I wonder if you have found a solution to the problem?
Cancel
Vote up 0 Vote down

Cancel
0 Yasuhiko Koumoto over 10 years ago in reply to Mikey

Hello,
did you check the value of r1 (i.e. source code)?
If it would be brought from a pointer of function label, the LSB of its address would be '1'.
Therefore, the copying might be done in un-aligned manner.
I'm sorry if I made the wrong direction.
Best regards,
Yasuhiko Koumoto.
Cancel
Vote up 0 Vote down

Cancel
0 Mikey over 10 years ago in reply to Yasuhiko Koumoto

My code is written in C, but I have checked instructions step by step and address values in registers look ok, values under those addresses copied to registers look ok, value in RAM (only one in RAM, under one particular address!) after copying from register to RAM get mutilated. For all other data values, addresses everything works fine.
Cancel
Vote up 0 Vote down

Cancel
0 Freddie Chopin over 10 years ago in reply to Mikey

I have to say that my problem actually _WAS_ caused by breakpoints. When you have a breakpoint in RAM during the copying operation, the symptoms will be as described in the first post. The affected address is the one where the breakpoint was placed.
Cancel
Vote up 0 Vote down

Cancel
0 Mikey over 10 years ago in reply to Freddie Chopin

You are right! I have just checked and turns out that every debugging point I am creating in the part of software that got copied to RAM gets corrupted! Now the question is how to debug it?
Cancel
Vote up 0 Vote down

Cancel
0 Freddie Chopin over 10 years ago in reply to Mikey

I was able to debug my code in RAM - or at least the things I did during the debugging worked correctly. The only issue is that you cannot have a breakpoint placed in RAM when you copy the code from flash to RAM region that has this breakpoint. After the code is copied you can do whatever you want and place as many breakpoints as you like - just not during the flash->RAM transfer.
Cancel
Vote up 0 Vote down

Cancel
0 Mikey over 10 years ago in reply to Freddie Chopin

Thanks freddiechopin! That was very helpful!
Cancel
Vote up 0 Vote down

Cancel