All,
I am trying to track down a problem of some code that was written by an 'overseas' 3rd party (I will be nice and not state the country of origin).
This code uses a single timer set at a 1mS interrupt rate in order to determine if the SPI is still communicating externally with it's master. If no communications are detected the timer resets the SPI port, clears the interrupt, and jumps to the reset vector. The problem is: the SPI port remains dead until a power cycle is accomplished.
The obvious fix is to use the watchdog (which is what I will eventually do), but I would like to understand the why of why this does not work (yes, bad coding practice is the real reason)...
Since the code jumps to the reset vector this is what I have been able to analyze:
(1) Since this is not a true reset (ie: via watchdog) all hardware registers are not reset - problem potential here. (2) The jump to the reset vector was accomplished while in supervisor mode, so the privleged registers (ie: SP,etc) can be written. (3) The timer interrupt was cleared prior to making the jump to the reset vector, so all interrupts are still enabled. (4) Since this is not a true reset, all resident code can still execute (ie: interrupt handlers). (5) The startup code will reset all initialized data, registers, etc prior to jumping to program main(), effectively returning data to a power up state.
One reason I can currently come up with as to why the SPI is never functional after this occurs is that maybe an interrupt occurs while in the startup code (clearing a tracking variable or resetting the processor registers). But the interrupt would also inhibit the startup code until it was serviced. This potential cause is (probably) not the only reason for this issue, and why I am asking for your input(s).
Unfortunately, this board has no JTAG to connect so stepping through the code is not an option. I could write to the serial port - if it was connected, but it isnt. Right now I am trying to analyze my way through this code before using a 'hammer' approach to solving this problem.
What else am I missing in this analysis? Thanks.
Another thought occured to me that I'd like to put up for review....
Lets say that the reset sequence is OK, by some sort of bad design luck.
If the slave's SPI was reset and then comes up while the master is actually sending data on the port, say on bit 3 of the data byte, the slave SPI would be out of sync with the data transmitted and always receive bad data.
Possible?
I guess this is possible. Found this post relating to the operation of the SPI port:
"Seems it doesn't need the /CS signal unless you really want to use it.
As long as you understand that any glitches on the SCK line during power up will be interpreted as clock pulses and will put the SPI out of synch with no way back without using the /SS line. ie any data can be 1 or more bits out of synch with the clock and will be meaningless."
This SPI is also not using SSEL line.
Interesting...
If you are nut using slave-select or other hw means to make sure that master and slave are in sync, then you would have to consider bit-stuffing, so that there is unique bit patterns that represents the start of a transmission.
The bad thing with this - when you don't have HDLC hardware - is that you will have to perform a lot of bit shifting in your receive code to both detect the stuffing and to combine data from multiple words and shift and merge into correctly aligned bytes.
It would then probably have been better if an ARM chip with HDLC hw support had been selected.
Per,
Thank you for that interesting suggestion concerning use an HDLC (or equivalent) protocol to ensure proper bit synchronization. IF this device ever gets to the point of an upgrade this suggestion will be at the top portion of the list. I also did a Layer 2 protocol for a SS freq hop system in the ISM band that used a preamble/delimiter type data frame that could also work.
In addition to the awful (being nice again) design of this device to begin with, the SPI has known issues concerning noise sensitivity (surprise, surprise - NOT). In so far as correcting this synchronization issue I have been toying around with an algorithm on the master side that may accomplish a re-synchronization of the master/slave devices in the following manner (again, this is just in the Hmmm stage):
If the slave device does not respond or does not properly respond to a command, do the following:
Switch the master SCK line to GPIO and send 7 clock pulses to the slave. Switch the master SCK back to SPI and then resend the last command. If the response is still corrupted perform the sequence above again except send 6 clocks. Continue in this manner (done to a single clock) until a correct response is received, then return to normal SPI operation.
Sounds like it would work, but switching the SCK from SCK to GPIO may induce unwanted glitches on the line which would nullify any chance that this would work at all.
Thanks.