This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

A war story involving RL-ARM 4.05

Hello,

I dealt with this problem until this morning. It made me sweat (a little), and I'm using uv4/RL-ARM, so I thought you might be interested!

It all started with upgrading RL-ARM to 4.05. Everything seemed to work fine, expect the power failure handling - once the machine is turned off, critical system data is written to a NAND flash. That must be competed within about 40[ms], and it worked well in previous releases of the product. Once a power failure is detected (via an I/O pin), a call at interrupt mode saves the data in flash, and then wakes up a high priority RTX task to do things that cannot be done in interrupt mode (close file handles etc.). A LED blinks once RTX is done (this shutdown behavior is imposed by hardware). But immediately after installing 4.05 this stopped working - the LED blinked alright, but the NAND flash contained corrupt data, clearly indicated by the checksum error during the next startup!
I figured out rather quickly that it was RTX's "fault". Using an older RTX with the latest MDK and FlashFS/TCPNet seemed to solved the problem.
But RTX only uses one hardware timer and no more - did the timing change? has a failure in the software been exposed by the new release? Another source of agony and confusion was the fact that another flavor of the product successfully used the same power failure handling code, and all the latest software from Keil (including RTX 4.05) but did not exhibit this failure.
The solution stumbled upon me somewhat later. I recalled that 4.05 added an event queue between interrupt service routines and tasks, and that the checksum error can be the cause of writing to a NAND flash without first deleting the respective pages (or at least, without careful planning). This could happen because of that new queue between the ISR and the kernel/task which probably introduces some delay; an existing mistake in the software allowed the flash to be rewritten if the ISR was triggered once or more before RTX could schedule the power failure task.

Parents
  • Indeed not - it is really quite common that a change of tools will highlight bugs that were previously "latent".

    Even without a change of tools, a change of settings can often have the same effect - the most common example being a change of optimisation level highlighting flaws such as missing 'volatile' qualifiers and unwarranted timing assumptions.

    Even without a change of tools or settings, an "apparently" unrelated change in your own code can cause previously unnoticed bugs to manifest; eg, a buffer overrun into unused memory may be benign - but will cause problems when something starts to use that memory, or there is no longer any spare memory to overrun into...

    I call this "Proven-Product Syndrome" - the assumption that, because a system has been running for some time, it must be "right".

Reply
  • Indeed not - it is really quite common that a change of tools will highlight bugs that were previously "latent".

    Even without a change of tools, a change of settings can often have the same effect - the most common example being a change of optimisation level highlighting flaws such as missing 'volatile' qualifiers and unwarranted timing assumptions.

    Even without a change of tools or settings, an "apparently" unrelated change in your own code can cause previously unnoticed bugs to manifest; eg, a buffer overrun into unused memory may be benign - but will cause problems when something starts to use that memory, or there is no longer any spare memory to overrun into...

    I call this "Proven-Product Syndrome" - the assumption that, because a system has been running for some time, it must be "right".

Children
No data