This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

A mysterious problem with At91Sam7S256

Note: This was originally posted on 4th October 2011 at http://forums.arm.com

This is likely to be highly vendor specific, but I am hoping someone can shed some light.

We are setting up the WDT to fire an internal and external reset.

The problem is that the chip hangs in a reset loop appx every 20mS.  The WDT can be set for any timeout up to the 16 second max, and we get the same 20mS loop. 

We thought that this was a bad PCB issue, PLL issue (even happens when the PLL is not used.) Crystal startup issue (problem also happens with other types of crystals, and oscillator startup looks good)  VDD risetime issue (verified risetime significantly faster than required), bypassing issue (PCB runs fine with almost all bypass caps removed, adding more does not affect the problem)

Interestingly, the WDT values that are listed as "working" values for the rev A silicon seem to cause the problem to appear less often than other values.
https://www.ledato.de/download/SAM7S256_128_errata_%28update-13Nov%29.pdf

We have seen this problem with the window enabled or disabled, and we intend to run with the window disabled.     If we are having a problem with a given board, we can cause the chip to boot up properly by either heating or cooling by a fraction of a degree, or by applying a slight torque/twist to the PCB.  The direction of the force is important, but is not the same for two different boards.  Once a given board has booted properly, it is extremely robust.  They survive power line disturbances at 2.5kV with 10nS rise time, EMI at >190V/M, and ESD events at 16kV while operating.

While the mechanical sensitivity would seem to indicate a PCB problem, we have replicated this on an Atmel evaluation kit board.  The amount of flex varies, as little as 1/16th inch over 8" of board length to take it into or out of failure.  The thermal sensitivity is also extreme, heat from a slight touch of finger on the CPU for only a couple seconds is enough to induce or stop the rebooting.  I can't imagine the die temperature is changing more than a fraction of a degree.

The problem shows up with rev A, B, and C silicon.
We implemented an extremely stripped down version of the code that only flashes some LEDs to indicate that the application is running, and this code also exhibits the problem.


The best that we have been able to do so far is to isolate that it seems to do with when (relative to rise of /RESET)  the WDT is initted.  For a given chip, if we change where in time the WDT is configured using NOPs, we can create or eliminate the problem on that board.  Some chips don't seem to exhibit the problem but since temperature, timing, and PCB flex all seem to be part of the equation we are only comfortable saying that a given system "has not been observed to have the problem".

There may be some relationship with the phase or timing of the slow clock, at the time that the WDT is configured, but we have not been able to find anything yet.

Out of a couple hundred of our boards, roughly 10% exhibit the problem  with a given code set.   If we had three boards fall out from a batch  "A", "B", and "C" then if we change the time before the WDT is initted  and re-program the batch, maybe "A", "D", and "W" would fail.  We have a  codeset that has never been observed to fail, but given the nature of  the problem, we are extremely nervous.

We have been through the errata and the data sheet extensively both by ourselves and at Atmel in San Jose, with their technical people.  So far nobody can explain why we are seeing this.

Has anyone here seen this problem?  Solved it?

Thanks.