This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

C51 data memory test

Hi,

I need to fill data memory both internal and external with known patterns and read back to verify the entire C51 RAM. I prefer doing this in C instead of assembly but have not seen a good example to follow. Any suggestions?

Thanks

Parents
  • An important aspect here is that whenever external memory is involved, there is the probability of bad solder joints. So it is always good to try to scan all signals that can be scanned to verify correct connectivity.

    But the problems are huge.

    For example - you turn on the equipment when it is cold. A test when cold does not tell the behaviour after it has been running for a while and reached the steady-state temperature.

    Next thing is that smaller microcontrollers often uses 3V3 or 5V and often have most external equipment run with same voltage. More high-end processors normally use multiple voltages. So a very low voltage for the CPU core. A higher voltage for the GPIO signals. A third voltage for the RAM chips. Now, you suddenly got more problems. The reliability of the RAM depends on it getting the correct voltage. With too high voltage, it may get overheated and start to misbehave after a while. With too low voltage, it may run too slowly resulting in too small timing margins resulting in random bit errors. And with incorrect interface voltage between processor and memory you may get incorrect sampling of the high logic level.

    So in the end, you have a situation where a startup test often may have a very limited value except to find blatantly broken hardware. In real life situations, safety critical equipment needs to perform constant supervision while delivering the live service it is designed for. So you need to store checksums for all data structures you can get away with and constantly scan through them to verify that they are correct. This not only verifies memory integrity but also verifies that you don't have any bad pointer accesses resulting in memory corruption. After all - the majority of failures an end user is likely to see are from software errors unless you have fouled upp the oscillator or power supply + filtering design.

    Obviously, tests designed to run concurrently with the normal operation can't perform any memory tests according to any by-the-book factory tests. You will have to investigate all hw parts and all software modules and create a testability document where you figure out how a running equipment can get a reasonable confidence that everything works as expected. And you will have to figure out what external hardware (watchdogs or whatever) you may need to produce reasonable error messages to the end user in case the system is so broken that it can't perform the normal self-tests and present normal error messages.

    A question here is what courses or past experience you have in writing this type of software, where it isn't just enough to make sure an RTOS always manages all required deadlines but where you also needs to supervise all individual actions and all produced output signals and if possible double-check all input signals.

    In the end, the device (if the standard so requires) must be designed so that it really doesn't matter how bad the processor or any other hardware is. But the system should either stay pitch black or present a clear error indication in case it can't produce the normal results with the required confidence. Or, for some standards, it may continue to produce results but must then clearly indicate that there are confidence problems.

    Are you the only software developer? Note that many of the standards involving human safety are not just happy with "we are doing" but also requires you do produce huge amounts of documentation about all your design steps, reasons for decisions etc, etc. So if you are stuck right now about memory tests, you might get seriously more stuck a while later when the company have invested more money into the product and the reviews redflags your development process, the design, the documentation, the testing, ...

Reply
  • An important aspect here is that whenever external memory is involved, there is the probability of bad solder joints. So it is always good to try to scan all signals that can be scanned to verify correct connectivity.

    But the problems are huge.

    For example - you turn on the equipment when it is cold. A test when cold does not tell the behaviour after it has been running for a while and reached the steady-state temperature.

    Next thing is that smaller microcontrollers often uses 3V3 or 5V and often have most external equipment run with same voltage. More high-end processors normally use multiple voltages. So a very low voltage for the CPU core. A higher voltage for the GPIO signals. A third voltage for the RAM chips. Now, you suddenly got more problems. The reliability of the RAM depends on it getting the correct voltage. With too high voltage, it may get overheated and start to misbehave after a while. With too low voltage, it may run too slowly resulting in too small timing margins resulting in random bit errors. And with incorrect interface voltage between processor and memory you may get incorrect sampling of the high logic level.

    So in the end, you have a situation where a startup test often may have a very limited value except to find blatantly broken hardware. In real life situations, safety critical equipment needs to perform constant supervision while delivering the live service it is designed for. So you need to store checksums for all data structures you can get away with and constantly scan through them to verify that they are correct. This not only verifies memory integrity but also verifies that you don't have any bad pointer accesses resulting in memory corruption. After all - the majority of failures an end user is likely to see are from software errors unless you have fouled upp the oscillator or power supply + filtering design.

    Obviously, tests designed to run concurrently with the normal operation can't perform any memory tests according to any by-the-book factory tests. You will have to investigate all hw parts and all software modules and create a testability document where you figure out how a running equipment can get a reasonable confidence that everything works as expected. And you will have to figure out what external hardware (watchdogs or whatever) you may need to produce reasonable error messages to the end user in case the system is so broken that it can't perform the normal self-tests and present normal error messages.

    A question here is what courses or past experience you have in writing this type of software, where it isn't just enough to make sure an RTOS always manages all required deadlines but where you also needs to supervise all individual actions and all produced output signals and if possible double-check all input signals.

    In the end, the device (if the standard so requires) must be designed so that it really doesn't matter how bad the processor or any other hardware is. But the system should either stay pitch black or present a clear error indication in case it can't produce the normal results with the required confidence. Or, for some standards, it may continue to produce results but must then clearly indicate that there are confidence problems.

    Are you the only software developer? Note that many of the standards involving human safety are not just happy with "we are doing" but also requires you do produce huge amounts of documentation about all your design steps, reasons for decisions etc, etc. So if you are stuck right now about memory tests, you might get seriously more stuck a while later when the company have invested more money into the product and the reviews redflags your development process, the design, the documentation, the testing, ...

Children
No data