This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Flash reliability problems

We are using a LPC2138
I have just received a batch of boards that fail functional test.
When I reprogram the binary, they work perfectly.
So I have read back the contents of a failing board and compared it to a binary read back after I reprogram the board.
They are identical.

The LPC2138 errata sheet does list an error for the MAM which
"Under certain conditions when the MAM is fully enabled (Mode 2) code execution from internal Flash can fail."

However, this does not explain why it works when I reprogram the boards. Also the boards do not crash.

Is there anything else I can test before I re-program them all?

Parents
  • One thing - was it just the size of the application you read out, or did you read out every single bit?

    Another thing - the flash doesn't just store your application. Every sector also contains ECC information to allow it to detect and autocorrect bit errors. If you have marginal data, then you may sometimes fail such autocorrection. Too many bits gets the wrong value, so the ECC information can't compute the correction. This is very much affected by the temperature of the device. The device temperature when you programmed the units also affects the safety margin.

    Another thing - are you 100% sure that your devices always have proper supply voltage? In case the supply voltage is affected by the quality of the external supply voltage, you can get into troubles when programming the flash. Maybe the factory had a broken PSU?

Reply
  • One thing - was it just the size of the application you read out, or did you read out every single bit?

    Another thing - the flash doesn't just store your application. Every sector also contains ECC information to allow it to detect and autocorrect bit errors. If you have marginal data, then you may sometimes fail such autocorrection. Too many bits gets the wrong value, so the ECC information can't compute the correction. This is very much affected by the temperature of the device. The device temperature when you programmed the units also affects the safety margin.

    Another thing - are you 100% sure that your devices always have proper supply voltage? In case the supply voltage is affected by the quality of the external supply voltage, you can get into troubles when programming the flash. Maybe the factory had a broken PSU?

Children
  • It does not give data abort, application appears to run partially.

    I am reading the full 512K bytes as that is what is programmed.

    I am puzzled that I can read back the 512K correctly (binary compare is same), but after re-programming everything works perfectly.

    I am interested in the power/programming voltage. I cannot verify this becasue it is done in India.

    I do remember that years ago, when programming E2, we used to write over and over serveral times then read back.

    Is it possible to the program , but ECC information causes a problem because there are too many errors? This is something I know nothing about.
    Any information would be very welcome.

  • You can't control the programming voltage. It is generated internally. But it is important that the processor has a reasonable temperature, and a stable (and correct) external supply voltage and that the clock frequency is correct (and matches what is assumed) during the programming.

    If the external voltage varies, the internal voltage pump may not be able to produce a stable programming voltage.

    If the temperature is too high, the programming algorithm may fail.

    If the clock frequency is wrong, then the programming algorithm may erase the flash sector for too short time or may write the data with wrong timing.

  • In that case I wouldn't read back the program correctly though? Is it possible that the program looks correct but hasn't been programmed correctly?

  • Yes. With marginal contents, it can read back correct 9999 times of 10000. If it affects your main loop, every iteration may then have 1/10000 to fail. And if the processor gets hot when running, that failure rate may suddenly become 1/100 for every iteration.