This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

TCPNET HTTP server: TCP checksum errors

Hello,

I have implemented a web server using RL-ARM. The problem I am trying to resolve is that, occasionally, the web server will 'hang' for about two seconds while in the middle of serving a HTTP response to the browser. This does not happen very frequently; 95% of the time, complete pages are served almost instantly.

Using Wireshark, I see that what's happening is that TCPNET is sometimes sending out a TCP packet (containing HTTP data) that has an incorrect checksum. Wireshark actually marks the packet as "Continuation or non-HTTP traffic", and the 'bad checksum' flag is 'true'.

About two seconds after the bad packet is issued, I can see that TCPNET is retransmits the packet. Wireshark marks it as "[TCP Retransmission]". Inspection of this retransmitted packet shows that it contains exactly the same HTTP data as the bad packet, except this time the packet is usually a little longer (perhaps just several bytes, or sometimes tens of bytes longer) and has a good checksum value.

So what's happening is that the browser ignores the packet with the bad checksum, and the 'hang' is when it awaits the retransmitted packet.

The retransmitted packet is almost always longer. It's as if the bad packet, with the wrong checksum, has somehow become slightly truncated.

This is the only issue I am experiencing; everything else seems to be running absolutely fine with our web server. It has been going through very extensive testing and I've never seen anything else that would point to data corruption.

The platform is the ST ARM9. The software in use is:

uVision V4.00
MDK-ARM V4.00
RL-ARM V4.00

The problem has been present ever since we started developing using earlier V3.x versions of uVision, MDK and RLARM. This isn't something that has been introduced with any particular release of Keil software.

In our application there are three tasks running: Main application (middle priority), a serial communications task (highest priority), and web server task. The web server is set to lowest priority. As a test, I have tried making it the highest priority task but this didn't eliminate the checksum errors. At the moment, I am in the process of disabling as much of the main application as I can, along with interrupts, etc. to see if I can determine what, if anything, in our code could be upsetting TCPNET.

In the meantime I am just curious as to whether anyone has experienced anything similar to this. It's something that I'm finding very tricky to debug.

Thanks,

Trevor.

Parents
  • "Which particular ST ARM9 device are / were you using for your project?"

    The initial development started with the Keil MCBSTR9 board, fitted with an STR912FW44X6 (rev G).

    Our development board uses an STR912FW44X6 (rev H).

    The project uses raw TCP sessions.

    The problem seemed to start when I went beyond the Keil examples and started putting in the 'real-life' code. I've tried going back to the Keil examples (as have Keil support) and I see no problem.

    I believe that the problem is due to some sort of interaction between the basic TCP communication and 'something else'. Since the 'something else' is missing from the Keil examples, the problem is not seen there.

    What I also found was that a slight minor change in a part of the project seemingly disassociated with TCP communication would have an effect on whether the problem would be visible and the frequency of it.

    In one particular example that I gave Keil support, I created two binary files of the project. One repeatedly failed and the other worked for hours without seeing a problem. The only difference between the two binary images was a series of five instructions that were in a different order. To me, the functionality of that sequence of instructions was the same. I could not see what was causing the apparent difference - Could there be some instruction queuing/caching difference? I just don't know.

    If you could spend the time to create some code that repeatedly and easily fails, then maybe we can convince Keil support to look at it again.

Reply
  • "Which particular ST ARM9 device are / were you using for your project?"

    The initial development started with the Keil MCBSTR9 board, fitted with an STR912FW44X6 (rev G).

    Our development board uses an STR912FW44X6 (rev H).

    The project uses raw TCP sessions.

    The problem seemed to start when I went beyond the Keil examples and started putting in the 'real-life' code. I've tried going back to the Keil examples (as have Keil support) and I see no problem.

    I believe that the problem is due to some sort of interaction between the basic TCP communication and 'something else'. Since the 'something else' is missing from the Keil examples, the problem is not seen there.

    What I also found was that a slight minor change in a part of the project seemingly disassociated with TCP communication would have an effect on whether the problem would be visible and the frequency of it.

    In one particular example that I gave Keil support, I created two binary files of the project. One repeatedly failed and the other worked for hours without seeing a problem. The only difference between the two binary images was a series of five instructions that were in a different order. To me, the functionality of that sequence of instructions was the same. I could not see what was causing the apparent difference - Could there be some instruction queuing/caching difference? I just don't know.

    If you could spend the time to create some code that repeatedly and easily fails, then maybe we can convince Keil support to look at it again.

Children