Hello,
I have implemented a web server using RL-ARM. The problem I am trying to resolve is that, occasionally, the web server will 'hang' for about two seconds while in the middle of serving a HTTP response to the browser. This does not happen very frequently; 95% of the time, complete pages are served almost instantly.
Using Wireshark, I see that what's happening is that TCPNET is sometimes sending out a TCP packet (containing HTTP data) that has an incorrect checksum. Wireshark actually marks the packet as "Continuation or non-HTTP traffic", and the 'bad checksum' flag is 'true'.
About two seconds after the bad packet is issued, I can see that TCPNET is retransmits the packet. Wireshark marks it as "[TCP Retransmission]". Inspection of this retransmitted packet shows that it contains exactly the same HTTP data as the bad packet, except this time the packet is usually a little longer (perhaps just several bytes, or sometimes tens of bytes longer) and has a good checksum value.
So what's happening is that the browser ignores the packet with the bad checksum, and the 'hang' is when it awaits the retransmitted packet.
The retransmitted packet is almost always longer. It's as if the bad packet, with the wrong checksum, has somehow become slightly truncated.
This is the only issue I am experiencing; everything else seems to be running absolutely fine with our web server. It has been going through very extensive testing and I've never seen anything else that would point to data corruption.
The platform is the ST ARM9. The software in use is:
uVision V4.00 MDK-ARM V4.00 RL-ARM V4.00
The problem has been present ever since we started developing using earlier V3.x versions of uVision, MDK and RLARM. This isn't something that has been introduced with any particular release of Keil software.
In our application there are three tasks running: Main application (middle priority), a serial communications task (highest priority), and web server task. The web server is set to lowest priority. As a test, I have tried making it the highest priority task but this didn't eliminate the checksum errors. At the moment, I am in the process of disabling as much of the main application as I can, along with interrupts, etc. to see if I can determine what, if anything, in our code could be upsetting TCPNET.
In the meantime I am just curious as to whether anyone has experienced anything similar to this. It's something that I'm finding very tricky to debug.
Thanks,
Trevor.
Trevor,
I think I am seeing a very similar situation - And have been trying to narrow it down on and off for the past five months.
I also use an ST ARM9, started seeing my problem with uVision 3.40, continued to see it with 3.50, and am still seeing it with 4.00.
What I see is an occasional TCP packet being transmitted (as viewed by WireShark) that looks like it is being truncated and then padded to the Ethernet minimum. The padding always seems to be NULLs and invariably the TCP checksum is corrupt.
As a consequence of this bad packet, the receiver ignores it and the TCP stack on the STR9 re-transmits it after a few seconds.
My investigations suggested that there is a problem with the STR9_ENET.C module. I have proven (to my own satisfaction) that the packet is being sent to the module correctly and, after transmission, the packet is still valid; i.e., there is no immediately obvious memory corrution occuring. I strongly suspect there is a DMA transfer issue, but cannot think of a way of proving it.
Producing a minimalistic example suitable for Keil support to test has been a problem for me. Four months ago I contacted Keil support, sent them the code and only in the past couple of days have been told that I was doing something wrong in the TCP callback.
This thing I am doing wrong is not documented in any way that I can see and is (apparently) causing corruption of the TCP stack. Regardless of that, I have modified the code accordingly and still I see the problem.
Unfortunately Keil support are not now seeing the problem.
In desperation for a resolution, I modified the STR9_ENET.C module. So far, I have seen no corrupt packets when this updated module is used.
If you like, I can send you my modified module for you to try - If it works for you, then maybe I/you/we can contact Keil to see if they will follow it through.
That is most fascinating that you're also having the same issue. About the packets sometimes being padded out, do you see this happen if the packet being sent to send_frame is less than around 60 bytes?
As it happens I currently have breakpoints in send_frame and I have been looking for an opportunity to inspect the contents of *frame when Wireshark shows a bad packet has been sent. I too have been wondering if it could be something to do with the DMA transfer process. Really, it's the only thing I feel I can try and debug, because with the rest of TCPNET being sealed in a library I feel quite in the dark.
I'd be extremely grateful if I could take a look at your ethernet file and give it a go. Would it be possible to post the amendments you made on here, or perhaps send via Sendspace?
Regards
Trevor
Please send an email to: support.intl@keil.com and ask for an updated driver.
Franc,
Does this mean there is now an updated driver that corrects this problem?
Just before seeing Franc's post, I had sent another project to Keil Support that seems to reliably show the problem.
I eagerly await the updated driver to try.
Hmmm....
Just received the updated driver, and still get the problem :(
I remember trying the very same sort of fix on it myself three or four months ago and then going on to try something else.
Keil support have my updated project so I'll pass the details on to them.
Also noticed that the update is based upon an older version of the code, so there is one part that has reverted.
Was this up to about 3.22 (and still is in the update):
void int_enable_eth (void) { /* Ethernet Interrupt Enable function. */ VIC0->INTER |= 1 << 11; }
From about 3.22, this changed to:
void int_enable_eth (void) { /* Ethernet Interrupt Enable function. */ VIC0->INTER = 1 << 11; }
I confirm also that using Keil's modified send_frame() just immediately brings the problem back for me too.
I shall continue using your modifications for now if that's alright...!
"I shall continue using your modifications for now if that's alright...!"
Sure you can.
I've notified Keil support and suggested that they try my app to see if they can re-create it.
I'll keep you informed.
At the moment, Keil support are unable to replicate the error with my code. With that very same code, I'm seeing the problem on the MCBSTR9 board and our own board.
If you can do something that you think would show the problem more consistently, I think they would appreciate a copy.
Please try the last driver that you have received from support and change the number of TX buffers in the header file:
#define NUM_TX_BUF 3
It seems that this solves the problem.
I've been trying my 'test' project with this fix and so far have seen no errors (after more than 1.5 million packets).
It looks promising.
I'm now going to put it into my 'live' project and set up a test to run over the weekend.
Unfortunately, the tests on my 'live' project were not successfull; i.e., I still see the error.
I believe I also have this problem. Could I also have a copy of the updated driver?
Stuart.
Hi Stuart,
If you send an email to KeilSupportIntl (at) arm (dot) com and give this thread as a reference I expect they'll send it to you quite quickly.
Trev
Keil are now looking into this problem further.
I found that their 'updated' driver would also fail.
It looks like the link I gave for my modified version is still alive:
www.sendspace.com/.../2oq3q5
You may want to give that a try.
One problem they're having in chasing this issue is reliably reproducing it. I had a small-ish test program that would show it up, but when I used Keils update it worked - Only to then fail in my 'real' project :(
As Trevor suggests, it is worth you contacting Keil support and asking for their update. I suspect they might be interested in your results.
Just checked Mr Sausage's code and yes it fixes my problem. So I will stick with that for now, and also ask Keil for the fix and see if that works for me.
While it didn't stop my program from working it was annoying getting silly pauses everywhere!
Thanks!
Ok I have now checked Keil's fixed driver, and I still have the problem.
So for the moment I will stick with Mr Sausage's code which works.
Thanks.
One day someone is going to ask why there's a comment saying "thanks to Silly Sausage for this bit" in our code.
:-)
Maybe they will send you to therapy for your obvious eating disorder :)
It was one of them things I was laughing about as I was writing it!
View all questions in Keil forum