Hello,
I was hoping to hear your opinion about a serious problem I have - it is either I solve it or reduce my LPC2478 CPU speed from 72[MHz] to 64[MHz] (11% loss. The problem does not seem to be occurring at lower MHz settings). I posted about this in the past but it was a long time ago. When I place a controller in an environmental chamber and increase the temperature to 80+ Celsius degrees, I often see data abort exceptions, and sometimes I get the impression that the PC takes a hike (even the firmware LED that blinks every 1 second becomes irregular for a while before it stops). The program is launched by a boot loader and has a lower level supporting firmware layer that handles some interrupts (not all). I also see that if RTX is not started at all (but the application hangs in a "for (;;)" loop instead, hence the bootloader and firmware layer were/are involved, but the application is idle) - the system never crashes! I have excluded, as far as I could tell, the roll of external memory or RTX in this situation. However, I still suspect RTX a little (even though my test programs never crashed). My question: did you ever encounter such a situation? Where do I look best? can this be the result of a misbehaving peripheral? NXP have confirmed the LPC2478 is not the reason.
if I empty all the RTX tasks - I experience no crash.
Hmmm... if there's nothing being done any more by any task, and the system crashes: how exactly did you expect to experience that fact?
How can the temperature determine the behavior, assuming all the components support these temperatures (they do) ?
Maybe because that assumption is wrong, or some components are actually hotter than you think they are, or this close to their thermal limits, enough components have begun to change in behaviour that your electronic design has been driven beyond at least one of its design margins.
This is pretty much guaranteed to be a hardware problem. The only way heat can affect software behaviour is by affecting hardware first.
"The only way heat can affect software behaviour is by affecting hardware first."
Unless the OP is [once again] using undocumented RTX calls ;)
Hans-Bernhard,
you are right. I am now trying to determine the upper thermal limits of the entire system. it seems a lot more stable now with the production hardware in use - the previous one did use some components whose thermal limits were below 80 degree. so far, 79.3 degrees seem do no harm!
The RTX call in question wasn't undocumented. As a matter of fact, the RTX call might actually be incorrectly - or at least misleadingly - documented.
Without the source for the function, I can't prove the correctnes of the documentation, but the documentation did miss reporting one important issue - how the delay function behaved if called at too low frequency. And the no-no part of the documentation looked more like a "you should think about ..." section that somehow had been upgraded into a no-no without a real reason.
"... a no-no ... "
If something is stated as being a no-no, and someone decides to use it, then they are (in my book) using undocumented details.
Hmmm... the sign say's don't open this locked door. I'll break it open anyway. Whoops, they didn't tell me that there was a pride of lions on the other side.
This is an off-topic discussion that has been run already, but a quick summary.
Missing information that just should be available for os_itv_wait() is that if you call the function with a delay longer than the timer repeat time, the missed events will be counted and you will get one or more instant returns from the os_itv_wait() when you finally calls it, until you have consumed the back-log.
The reason why Keil did say that os_itv_wait() and os_dly_wait() shouldn't be used at the same time is just that a long os_dly_wait() call will result in os_itv_wait() having multiple ticks of back-log. That is not a no-no reason. That is just a "watch out for this".
If Keil thought it important to mention os_dly_wait() and os_itv_wait() for the same thread, then they should have documented the real reason, since the real reason can be trigged by many other ways. Any other wait function will give the same result. A too high IRQ load will give the same result. A high-priority task with a lot of work to do with give the same result.
So in the end, Keil updated a "think about" into a "no-no" for the wrong reason, and while giving the user too little information to be able to catch all the other _potential_ problems of using os_itv_wait() in relation to other code.
Just remember how the post started:
"If you violate RTX's rule ..."
Anyway ... As you say, of topic. But slightly more relevant to Keil tools than use of environmental chambers!
"...the previous one did use some components whose thermal limits were below 80 degree."
And you found it was failing at 80+ degrees. Well, there's a surprise (not).
This thread might well have been started with:
"If you violate the thermal specifications ..."