I am facing some embarrassing problems recently.
After some trouble-shooting, I found that: We have Board-A and Board-B. Board-A and Board-B communicate to each other with a UART TTL Level Communication. The communication cable is around 80cm long. During the communication, I got a lot of UART errors.
My mission is to build a more reliable communication between Board-A and Board-B; but not allowed to modify the hardware design and baud-rate.
To me, it is not wise to use a UART TTL Level Communication between two boards. However, I am being told that, it is very popular to us to use a UART TTL Level Communication between two boards.
I tried to find some articles/documentation to convince the involved people, that, they should not use a UART TTL Level Communication between two boards. But I can not find anything useful. What I could find is something like: The UART usually does not directly generate or receive the external signals used between different items of equipment.
My question is: Where can I find some convincing articles/documentation to convince the involved people? (This is to avoid the future problems.) If I am not allowed to modify the hardware design and baud-rate, what choices do I have to build a more reliable communication?
I believe that, my ex-hardware-partner in my previous company can make the hardware drive of the TTL levels solid if you can show the schematic (only what is connected to TX and RX pins in both ends) I'm sure someone will have a so;ltion.
HOWEVER that does not belong in a Keil forum and since you say "toolset none" I can't suggest one to use.
Erik
jeff thomson???
Is Erik playing around and using monikers again?
;)
Is Erik playing around and using monikers again? no, but Keil is.
Jeff is the one that here signs off on licences and, evidently when Jeff signed on at this computer Keil changed the name in the forum access.
thanx for seeing it, will be corrected
Check that Keil didn't reactivate mails for every new post to threads too - they tend to like to add that setting whenever they can.
Is Erik playing around and using monikers again? I never used a monniker, I, as opposed to many, do not hide behind monnikers, I am willing to stand behind what I say.
"If I am not allowed to modify the hardware design and baud-rate, what choices do I have to build a more reliable communication?"
Exactly what problems do you have?
The normal way to get more reliable communication is to add error-detection and error-recovery.
So, one step up is to add parity. Then you find odd numbers of bit errors in a character. This worls well for both stream and block-level transfers.
If you have block-level transfers, it's natural to add a checksum (which in this case should not be a normall additive sum, but something like crc16, crc32, adler-32 or similar) to improve the chance to detect a broken packet.
Of course, detection must be complemented with recovery, i.e. retransmission in case of transfer errors.
A stupid way to solve potential tranfer problems is to just send everything multiple times. This is acceptable unless having commands that risks being executed multiple times.
Anyway - much of the potential help depends on if you sends streams of data or blocks of data on that serial link. If it isn't block-based commands but instead stream transfers, you would normally instead reduce the number of data-carrying bits in each character sent, to let you use more bits for error detection and potentially error-correction. In short - by trying to get a higher hamming distance between the symbols.
In some cases, (such as when transmitting sound streams) you can improve the reliability by interleaving multiple sound samples over a couple of transmitted bytes. So if you get a "drop out", you only lost bandwidth of the transmitted signal - you recreate the lost data from bits transmitted before/after the drop. With ability to step up the bandwidth, you could recover without quality loss by having enough redundancy.
Interlaving without stepping up the bandwidth normally only works with sound, where you only affect the quality with transfer errors. If sending commands or text or similar, then you just have to have retransmission or redundant transfers to be able to get a 100% correct recovery (or know that the link transfer is bad beyond repair).
Hi Erik,
Many thanks for your help.
For Board-B, it is a Fujitsu MB90350 MCU; I only have the schematic in paper, and don't have anything about Board-A. And somehow "they" apply really strict policies to anything. I had been questioned for emailing a screen capture of several lines of source code to the FAE of Fujitsu's local distributor. And they don't even provide HEX files to customers.
I believe "they" would not modify the hardware; it is a product already [passed] the testing and qualification, and already released to the market/end-users. (This product also has several derived series.) And the complaints from customer/end-users never stop.
==========
Hi Andy,
It is very basic electronics to a good HW engineer. But it is not easy to find a good engineer here. I don't know if our consultant can make the hardware drive of the TTL levels solid, even he can, "we" will not leverage his knowledge. "We" believe "we" are good enough to do our developments. The problems "we" encounter are only due to our unexperienced and unlucky. Our management are introducing more and more Process Management/Project Management Procedures to the R&D department.
Hi Per,
I will come back to try to learn something about "buffer chips".
Many thanks to all.
My supervisor wants me to develop a new firmware for Board-B, but doesn't want me to change anything critical.
You do have a big issue here.
Buffer chips should be used on both sides.
If you have buffer chips on your side, but their side don't make use of inputs with schmitt-trigger inputs, then you may be in trouble.
If they can't handle the capacitances when driving signals, you can get into troubles - especially if you don't have schmitt-trigger inputs on your side.
If you can't control the software on their side, then there is nothing you can do to improve the reliability using more clever software.
So the only option available might in the end be to consider what type of cable that is used. Is it a cable with signals on every wire, or a cable with a ground line between each data line, similar to the design of newer PATA IDE cables on a PC (they use a 80-pin cable as replacement for the original 40 signals of the IDE interface just to allow a higher bandwidth without too much emissions or noise pickup).
Another thing - does the signal quality improve if using ferrites on both sides of the cable, to block other noise from taking advantage of the cable to jump from one board to the other board? There are a large number of different ferrite designs, allowing the ferrite to be clicked onto the cable, or the cable to be twisted around the ferrite.
In short: Exactly what parameters are _you_ allowed to change on side A or side B or both sides? Both for hardware or software or timing or cabling?
even he can, "we" will not leverage his knowledge.
sounds like you are in an impossible situation, evidently "they" are total idiots. You have a definite hardware problem and fixing hardware by software never yeld good results.
That leaves only what Per suggest: reduced baud rates and error checking by CRC/checksum. With that implemented what are you going to do when 47111 tranamissions of the same record error out?
The original problem is that Board-B fails to respond to the request from Board-A. And I observed a lot of framing error.
One day after, the hardware engineer told one of my team member, that he replaced a broken electronic component, so the communication should be solid enough.
Then I were forced to find another possibility, that why Board-B fails to respond. After more trouble-shooting, I found another software bug.
After that, customer reported new issues, and urged us to provide explanations/solutions for old existing issues.
Unless I get new commands, I would simply close this communication error issue.
The cable I had, has only three wires: 1. Tx 2. Rx 3. Wake-up / Enable the communication
for a demonstration/test
if you have access to a scope, have a look at the pulses
if they look something like this (I suspect they do)
/| /| / | / | / |___/ |
try the following: in "the middle of the cable" (anywhere will do) connect a 1k ohm resistor from Tx to +5V and a 1k ohm resistor from Rx to +5V and a 1k ohm resistor from Wake-up to +5V and see what happens.
What - no ground??
Or does it have a shield used as ground?
"You have a definite hardware problem and fixing hardware by software never yeld good results."
Let's rewrite as "seldom yield good results".
It's just a question of how close you already are to the limits when starting to look for a sw workaround. Software workarounds can sometimes produce excellent results. But there must be some margins available somewhere for the sw to "grow into".
"What - no ground??"
That was my initial response too.
Anyway - with a problematic communication link, the first step is to try to look at the signals. That will tell if a baudrate change can make a difference, or addition of EMI filters, ferrites, pull-up etc. It could also give an indication if it would be meaningful to replase TX with a twisted TX/GND and RX with a twisted RX/GND.
Right now, we only know that you have framing errors.
- We don't know if there is a baudrate issue (a good source for framing errors) - We don't know if the flanks are very bad, making it hard to pinpoint the start bit well enough (also a good source of framing errors) - We don't know if there is a level issue, making one side not seeing enough voltage swing to safely move through the deadzone between a definitive zero and a definitive one. - We don't know if there is noise pickup from some other electronics. - We don't know if there is a grounding problem, either producing hum or maybe the two boards don't even have the same ground potential. - We don't know if both sides have some form of noise suppression in the UART to require a minimum length for start-bit detection, to avoid detecting spurious noise as false starts bits. - We don't know if the problem is symmetric, or if A=>B behaves differently from B=>A.
Most of the above can be figured out from curve shapes captured with an oscilloscope. Hopefully, datasheets tells a bit about the UART - but scope pictures together with printouts of serial data can indicate a UART with stupid start-bit detection. Both scope images and printouts can give indications about assymetric behaviour.
A baudrate error can be corrected by trying to step up or down the baudrate divisor a tick or two. A processor allowing the RX pin to be changed to a GPIO input pin (or even better a timer capture pin) can measure bit lengths to figure out exact baudrate of other side expressed in ticks of the own clock - so if one or both UART have bad oscillator frequencies, the baudrate can be automatically adjusted to compensate.
Level errors can often be fixed with pullup resistors, or with serial resistors.
Earth problems normally with a strong and common ground point that makes sure the two boards don't have "strength" enough to move their ground up or down relative to the other gear in the box.
Noise can normally be filtered, as long as the frequency of the noise is way above the used baudrate.
Too slow flanks can sometimes be helped with pullup resistors. Sometimes with other cables. Preferably with a suitable buffer, but that might be outside the scope in this case.
Stupid UARTs trigging on noise would require filtering of the signal.
Work with software workarounds should not be started until a reasonable explanation of the underlying hw issues have been found. Not all sw workarounds works well for all hw issues. Some hw issues can not be solved by sw workarounds - or only if allowing a significant drop in usable transfer rates (by just reducing the baudrate or by keeping the baudrate but adding redundancy or retransmissions).
Per,
Many thanks for your detailed and well-organized guidelines. (I had already read this guidelines; but did not say "thanks".)
==============================================================
NO. no ground, no shield.
They said that, Board-A and Board-B get their power from the same power source. So an extra ground pin is unnecessary.
Board-A and Board-B are assembled into the customer's end-product in the customer's factory. So the real cable is manufactured by the customer. They said, the real cable is similar to the cable I use; no ground, no shield. If this information is correct, then our customer does not see the need of an extra ground pin either.
If it is true that, both our customer and we don't see the need of ground pins, well, then this is a problem to the local industry.
The problems of Board-B were raised again yesterday. Two or three persons will be sent to the customer side to do the troubleshooting/explanation on next Monday, One to Mainland China, one to northern Taiwan.
I have a feeling that, many engineers here do their work to destroy their companies and their customers, busy in producing terrible hardware and software, busy in endless troubleshooting.
I am in charge to develop a new product which integrates the functionality of Board-A and Board-B. It seems that they prefer to use the hardware of Board-B for the new product. So the MCU would be Fujitsu MB90350, which is a MCU first-time released to the market on 2003. How wonderful and amazing.
People were talking about ISO-26262 here. And I noticed that TI announced some new MCUs recently, the ARM Cortex based, Hercules RM4x/TMS570/TMS470M, which have some kind of hardware support for ISO-26262. But all of these are just adornments to us; because our products are already solid enough.