This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

OT Corrupted internal flash on AT89C51ED2

Sorry for OT, but I in big trouble!

From time to time my devices with above mentioned processors corrupt content of internal program memory.
Does anyone see this behavior od 80C51ED2?

Dozsa Gyoergy

P.S.
Keil C51 user since '91

Parents Reply Children
  • 99% of the users of flash based '51s agree that if you use RC reset (which do not hold the uC in reset during power down) your chip will occasionally lose its flash contents.

    This thread is not about 99% of users of flash based 8051 derivatives, though. It is about the AT89C51ED2.

    Do you still think your solution will fix the OP's problem?

  • Hello Jack,
    Could you quickly explain how (and why) this kind of data loss can occur?

  • Do you still think your solution will fix the OP's problem?
    I'll bet you dollars to doughnuts that it will.

    to the OP:
    you have not replied to: "do you have a RC reset?"

    Erik

  • Hello Jack,
    Could you quickly explain how (and why) this kind of data loss can occur?

    I've no idea - that's why I'm asking Erik why replacing the supposed RC reset circuit with a reset supervisor would fix this problem with the AT89C51ED2.

  • that's why I'm asking Erik why replacing the supposed RC reset circuit with a reset supervisor would fix this problem with the AT89C51ED2

    already answered: (which do not hold the uC in reset during power down)

    Even the magnificent Jack Sprat can not tell what a uC will do when outside the Vcc limits and not in reset.

    These problems are always reported as "during power up" but they actually happen during power down.

    Erik

  • Yes, I use reset IC.
    This is an device in (my) production seven years, without significant modification. In the past two years sporandic errors are reported from customers, eg. flash corruption.
    I can't reproduce this error in lab, testing with more than 10,000 power on/off cycles.

    Georg

  • Erik,
    It this a problem specific to 51s (as suggested above) due to a design flaw?

  • It this a problem specific to 51s (as suggested above) due to a design flaw?
    no, it applies to all flash based micros.

    the 'design flaw' is not realizing the situation at power down by the developer, not the chipmaker.

    Erik

  • Yes, I use reset IC.
    good, then we have to hunt for the remaining 1% of causes. Which reset IC, partnumber, please. Do you have a decoupling cap directly across Vcc and Gnd on each and every chip on the board?

    This is an device in (my) production seven years, without significant modification. In the past two years sporandic errors are reported from customers, eg. flash corruption.
    is this certain (one) customer(s) or across the board?

    I can't reproduce this error in lab, testing with more than 10,000 power on/off cycles.
    it seems nobody can.

    Erik

  • already answered: (which do not hold the uC in reset during power down)

    You keep repeating this without exlaining why it would matter.

    Even the magnificent Jack Sprat can not tell what a uC will do when outside the Vcc limits and not in reset.

    I'm sorry to disappoint you, but I can.

  • Yes, RC resets are deadly to all forms of non-volatile data. But any chip with IAP capabilities can corruct the program in case the program runs awol, and the chip doesn't have a hardware-based lock support.

    Have you seen any indications of watchdog resets or other troubles that may indicate that you may get a stack overflow or stack corruption? Are you using function pointers?

  • Can you get one of the corrupt devices back and read the FLASH contents? I had this problem a number of years ago on devices with a reset supervisory circuit.

    The corruption was always the same, memory was filled with 7F 7F 7F ... or EF EF EF ...

    After contacting the chip vendor, I discovered that the device was likely entering a diagnostic mode that injected a test pattern into the FLASH memory. This was triggered when a particular pin rose so many volts above VCC (a condition that should NEVER happen in a stable design).

    In my case, the units were getting struck by lightening which caused GND (and a number of other pins) to rise above VCC. WOW.

    Maybe this information will be helpful to you.

    Jon

  • whether it is useful or not, it is an interesting war story! A couple of colleagues of mine had a similar problem some time ago, also related to violating the electrical interface requirements of a sensor that corrupted its flash memory. But they didn't bother to investigate thoroughly, and left it to the manufacturer.

  • I can't reproduce this error in lab, testing with more than 10,000 power on/off cycles.
    [...]
    it seems nobody can.

    Well, at least the usual laboratory power-supply on/off cycle often can't. It can take rather more "creative" power-cycles (bouncing switches, strange loads in parallel to the board, nearly dead batteries, ...) to hit the vulnerable spot of the system.

  • This is exactly this kind of real-world information that makes reading this forum worth it. We need more of these kind of "it happened to me 'cuz of ex-why-n-zee."

    The gem to glean from this is that chip vendors have many 'undocumented features' so finding the right person at the chip-vendor can be your best bet when you have exhausted all of the "It is You Who Screwed Up" situations. (That can take forever, so get started immediately).

    As far as corrupting non-volatile memory goes, I once sleuthed out what was occurring in a vendor's *new* chip by looking into the vendors other products that also used non-volatile memory and found that they had quite a few application notes with regard to protecting data during power-up and power-down on a "different component." I looked more closely at the exact specifications of the device I was having problems with and the specifications that those other "problem" devices had, and figured they used the same die or design methods with their new chip. I implemented and enabled various protections schemes they used on those other chips and used them on the *new chip*.

    The actual "cause" was indeed the power-down process.

    Your particular problem might be in hardware (e.g. the power down process) or it still might be in software.

    If you are doing some in application flash writing, the fix might include using any write protects the device offers. Some chips allow you to block all write operations on entire blocks using only software methods. Some will require physical pin configurations (the software protection registers will essentially "connect" those physical write-protect signals) to their protected state.

    So, check the AT89C51ED2 "User Memory Lock Bits" as it says that "(only programmable by programmer tools)." Try to check if that could be your problem... the programming tools you are using might not be providing you with the optimum level of memory locking.

    Step 1) Make sure your code handles any-and-all non-volatile transfers EXACTLY as documented. Take special note of where your code deviates. (That is IF you are reprogramming these things in the field)

    Step 2) Make sure that all of your hardware is operating per the specification data sheets.

    Step 3) Make sure that your hardware is not adversly driving ANY of the controller's pins as the system powers down or powers up. (Some I/O peripheral attached to an inductive circuit that when the power is shut down, suddenly sends a little spike of power to one of the MCU pins, etc. Or that some pins turn on/off before/after other pins do (within reason of course).

    Step 4) Re-check steps 1-3 and have another competent person with you as you do it again, while you explain why you think you are right at every validation point between the specs and how it is implemented.

    Step 5) Post a "What The...?" on this forum, (with the relevant information just to keep erik and Andy from explaining how poor your original post/question is, and thus, running the risk of endless tangents into hyperbolic rants--- rants by non-erik and non-Andy 'contributors' of course)

    Step 6) Call your chip vendor... ask for their applications engineer and get enough information to find out who you need to actually talk to such as the QA department or one of the engineers who worked on the chip (thats always fascinating!). Be prepared to explain steps 1 through 5--in detail. (And ,the odds are, they will tell you it is your fault. If you are lucky, they'll explain why it is your fault).

    Step 7) If steps one through six fails, design it using all analog circuits, and market it as a "Completely RISC Oriented (CRISCO) Architecture."

    Step 8) When step seven fails, apply at McDonald's. (or become an embedded consultant: the annual salary is about the same, but at least McD's will look better on a resume ;)

    --Cpt. Vince Foster
    2nd Cannon Place
    Fort Marcy Park, VA