This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Logging catastrophic software error on LPC1768

We are developing using LPC1768, we using a number of subsystems, TCP, SD card, SPI.

For some reason that we don’t understand occasionally we have some failure in the software that causes the system resets itself.

My question is what mechanism we could use to log the reason of the failure? we need to know at which precise moment the software failed so we could examine that after and correct the issue.

Thank you.

Parents

0 ImPer Westermark over 12 years ago in reply to Tamiryan Michael

Another alternative is to configure a block of RAM as no-init.

Keep a rotating log there.

If you get an unexpected reset (your startup code should be able to see if it was a power-on-reset or an external reset or if there was some other reason for the reset), then you set a flag that your logging code must not add any more log entries and that this RAM region now contains important trace information for you to extract and investigate.

So you might log "enter" and "leave" in critical chains. And you might log stack position at critical positions.

The problem with flash logging is that it just can't log with enough time resolution without you burning it to cinders very quickly. You are likely to need thousands of log entries every second unless you have a program that spends most of the time sleeping waiting for some far between interrupt to wake it up. And when you don't know what fails, then you can't just every 10 minutes write anything really meaningful to the flash - because exactly what is meaningful when the processor can go from "everything is well" to "let's crash and burn" in microseconds...

Next thing you can do is to try some defensive programming. So if you still have code space and free MHz you could add asserts and you could try to recompute expected state of variables and compare current state with recomputed state. If you do see a problem then you do know what to log.

You should obviously make sure you have nice exception handlers that can turn on some SOS LED or something and then just busy-loop while waiting for someone to get there and try to dump out as much information as possible - in this case all register and RAM content.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 ImPer Westermark over 12 years ago in reply to Tamiryan Michael

Another alternative is to configure a block of RAM as no-init.

Keep a rotating log there.

If you get an unexpected reset (your startup code should be able to see if it was a power-on-reset or an external reset or if there was some other reason for the reset), then you set a flag that your logging code must not add any more log entries and that this RAM region now contains important trace information for you to extract and investigate.

So you might log "enter" and "leave" in critical chains. And you might log stack position at critical positions.

The problem with flash logging is that it just can't log with enough time resolution without you burning it to cinders very quickly. You are likely to need thousands of log entries every second unless you have a program that spends most of the time sleeping waiting for some far between interrupt to wake it up. And when you don't know what fails, then you can't just every 10 minutes write anything really meaningful to the flash - because exactly what is meaningful when the processor can go from "everything is well" to "let's crash and burn" in microseconds...

Next thing you can do is to try some defensive programming. So if you still have code space and free MHz you could add asserts and you could try to recompute expected state of variables and compare current state with recomputed state. If you do see a problem then you do know what to log.

You should obviously make sure you have nice exception handlers that can turn on some SOS LED or something and then just busy-loop while waiting for someone to get there and try to dump out as much information as possible - in this case all register and RAM content.
Cancel
Vote up 0 Vote down

Cancel

Children

No data