We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
All,
I am trying to track down a problem of some code that was written by an 'overseas' 3rd party (I will be nice and not state the country of origin).
This code uses a single timer set at a 1mS interrupt rate in order to determine if the SPI is still communicating externally with it's master. If no communications are detected the timer resets the SPI port, clears the interrupt, and jumps to the reset vector. The problem is: the SPI port remains dead until a power cycle is accomplished.
The obvious fix is to use the watchdog (which is what I will eventually do), but I would like to understand the why of why this does not work (yes, bad coding practice is the real reason)...
Since the code jumps to the reset vector this is what I have been able to analyze:
(1) Since this is not a true reset (ie: via watchdog) all hardware registers are not reset - problem potential here. (2) The jump to the reset vector was accomplished while in supervisor mode, so the privleged registers (ie: SP,etc) can be written. (3) The timer interrupt was cleared prior to making the jump to the reset vector, so all interrupts are still enabled. (4) Since this is not a true reset, all resident code can still execute (ie: interrupt handlers). (5) The startup code will reset all initialized data, registers, etc prior to jumping to program main(), effectively returning data to a power up state.
One reason I can currently come up with as to why the SPI is never functional after this occurs is that maybe an interrupt occurs while in the startup code (clearing a tracking variable or resetting the processor registers). But the interrupt would also inhibit the startup code until it was serviced. This potential cause is (probably) not the only reason for this issue, and why I am asking for your input(s).
Unfortunately, this board has no JTAG to connect so stepping through the code is not an option. I could write to the serial port - if it was connected, but it isnt. Right now I am trying to analyze my way through this code before using a 'hammer' approach to solving this problem.
What else am I missing in this analysis? Thanks.
Let's say that your SPI data may never contain 10 bytes in a row with value 0x00.
So if the master doesn't get an answer - send 12 or more zeroes. Then make a pause of say 50 ms. Then start sending real data again.
The slave should be able to notice the long row of zeroes and know that it is a request to synchronize. When it then sees the pause, it can reinitialize the SPI controller and start waiting for more data, having cleared the internal SPI bit counter.
This method wastes a bit of time for synchronizing, but have the advantage that when the sender and receiver are synchronized, you will be able to keep a quite high speed without wasting time performing a lot of bit manipulations in the slave. Counting # of consecutive 0x00 is quite cheap. And only after having seen at least 10 bytes of zero do you need to start measuring if you have a pause in the transfer (which is needed since you may get 10, 11 or 12 zero bytes).
The next thing to do if the interface suffers from noise is of course to make sure that all messages have strong integrity checking. At least crc-32 but possibly even better. Maybe you should even consider a twodimensional scheme.
Per,
An even better, less time-consuming recovery method!
Excellent suggestion. Thanks.