I'm using TCPNET and the RTX kernel on an STR912 ARM9.
A simple question first: can TCPNET be configured to respond to a broadcast PING?
Second, I have noticed that sometimes the response time to PINGs gets erratic and long. Normally the response time is ~1ms, but sometimes (for no reason I have been able to debug yet) the response times go up to 1 or 2 seconds (frequently almost exactly 1 or 2 seconds).
The only cure seems to be to power-cycle my board. This happens both on my own board and on an STRB9 from Keil. When I was running the http demo I noticed that when the ping time goes up, the HTTP response also becomes very slow. What is happening?
Christopher Hicks ==
NORMAL:
hiss:~# ping 192.168.2.12 PING 192.168.2.12 (192.168.2.12) 56(84) bytes of data. 64 bytes from 192.168.2.12: icmp_seq=1 ttl=128 time=1.03 ms 64 bytes from 192.168.2.12: icmp_seq=2 ttl=128 time=0.967 ms 64 bytes from 192.168.2.12: icmp_seq=3 ttl=128 time=0.998 ms 64 bytes from 192.168.2.12: icmp_seq=4 ttl=128 time=1.00 ms 64 bytes from 192.168.2.12: icmp_seq=5 ttl=128 time=0.908 ms
SOMETIMES:
hiss:~# ping 192.168.2.12 PING 192.168.2.12 (192.168.2.12) 56(84) bytes of data. 64 bytes from 192.168.2.12: icmp_seq=1 ttl=128 time=352 ms 64 bytes from 192.168.2.12: icmp_seq=2 ttl=128 time=2001 ms 64 bytes from 192.168.2.12: icmp_seq=3 ttl=128 time=1940 ms 64 bytes from 192.168.2.12: icmp_seq=4 ttl=128 time=941 ms 64 bytes from 192.168.2.12: icmp_seq=5 ttl=128 time=1001 ms 64 bytes from 192.168.2.12: icmp_seq=6 ttl=128 time=1186 ms 64 bytes from 192.168.2.12: icmp_seq=7 ttl=128 time=1011 ms 64 bytes from 192.168.2.12: icmp_seq=8 ttl=128 time=510 ms
Hi Christopher,
We experience a very similar behaviour here with a LPC2368 ARM7 based board.
I didn't do tests with the http demo, but only with our firmware. I was investigating on relationships with other task in our application, but your post suggests me that the problem could be somewhere else.
Did you get a deterministic path to make the ethernet performance fall down? As far as I can see I can make it happens with the following procedures: 1. by enabling tasks that require to run an ISR at relatively high frequency (every 20ms) 2. more simply by stopping the application anywhere with the debugger and making it run again a while after.
An interesting thing I noticed today is this: by disconnecting the eth cable while stopping the application and putting it back in place after restarting, the problem won't show up! If I just let the cable connected with the CPU stopped for a while, the slow down immediately appears, as soon as the application starts again.
I confirm that, once the performance decreases, nothing but a power cycle can put it back to full speed.
What version of TCPNet are you using? Are you targeting the http demo to the RTX RTOS or compiling as a standalone app?
I don't know enough of the LPC2368 architecture, but... Could this be due to some misconfiguration in the CPU internal MAC or the DMA module used by the MAC?
Does anybody out there have the same problem?
TIA
May you have lost the transmit interrupt, so that the stack only can send when it receives a packet and gets a receive interrupt?
In this case, the ping answer would hang in the stack until a new ping is received. In the same way, an http answer will hang until a new request or something else is seen.
Mmm, thanks... I thought of this, because if I "flood" ping at a very high rate then everything seems OK (response times stay short). The 1 or 2 second delay is an exact multiple of the rate at which I am sending out ping packets in the non-flood case.
I can make the behaviour happen both with debug and non-debug versions of TCPNet, and with or without the RTX kernel. I am beginning to suspect the STR9_ENET ethernet device driver code - I am using the "stock" code supplied by Keil.
Thanks again,
Christopher Hicks
There was a problem in the early device adapations. However this has been corrected long time ago.
Are you using the last RL-ARM release available on this web page?
Thank you, Reinhard. I had updated the library, but not the copy of the driver in my project (partly oversight, and partly because of some local modifications). The new driver seems to work better: there is an occasional long response time, but it seems to always recover immediately:
16 bytes from 192.168.2.11: icmp_seq=55 ttl=128 time=0.693 ms 16 bytes from 192.168.2.11: icmp_seq=56 ttl=128 time=0.664 ms 16 bytes from 192.168.2.11: icmp_seq=57 ttl=128 time=1000 ms 16 bytes from 192.168.2.11: icmp_seq=58 ttl=128 time=1.42 ms 16 bytes from 192.168.2.11: icmp_seq=59 ttl=128 time=0.676 ms
what hardware are you using? We need to know also the PC side so that we are able to replicate the problem.
The embedded board is a Keil STRB9 board (and a Hitex STR912 eval board behaves the same). I am sending the pings from an old PC running linux version 2.4.27, and lspci tells me this about the ethernet card:
Ethernet controller: Lite-On Communications Inc LNE100TX (rev 20)
One weird thing I have noticed: if I have two STR912 boards on the network (different MAC and IP addresses obviously :-) ), and I send ping packets to both, then the delayed packets happen at about the same times for both boards.
There are switches, but no routers, between the host sending the pings and the two STR912 boards. The network is all 100MBit, and it is in use for other stuff but the load is not high.
I have a Windows PC on the same hub as the STR912 boards and running EtherSnoop on this seems to confirm that the delay happens inside the STR912 boards, and not elsewhere on the network.
On MCB2300 demo board vers. 3, using HTTP-Demo keil example, i have done as follows: 1) load application and let it run, in debug mode 2) start ping demo board: response time < 1ms 3) stop application wih debugger (obviously after a few seconds i see "ping timeout" on my pc) 4) restart application 5) after a few seconds, ping resumes but response time is very long (1000ms and more).
Hi
Was this issue ever resolved ?
I too am experiencing this problem with RL-TCPnet (see Thread no 10561). I am using the MCB230 with LPC2368 and MDK 3.11.
If you have any further information on this issue (or even better, a resolution) could you let me know.
Many Thanks
Des
Not fully resolved.
The situation was much improved with the (V3.05) driver. The long trains of long response times are gone, but still occasionally there is a single, isolated long response time.
This happens about once per minute, pinging at 1 second intervals, with 100ms TCPNet timer_tick(). Here is a typical sequence:
64 bytes from 192.168.2.203: icmp_seq=226 ttl=128 time=1.20 ms 64 bytes from 192.168.2.203: icmp_seq=227 ttl=128 time=1.61 ms 64 bytes from 192.168.2.203: icmp_seq=228 ttl=128 time=1.32 ms 64 bytes from 192.168.2.203: icmp_seq=229 ttl=128 time=1001 ms 64 bytes from 192.168.2.203: icmp_seq=230 ttl=128 time=2.88 ms 64 bytes from 192.168.2.203: icmp_seq=231 ttl=128 time=1.51 ms 64 bytes from 192.168.2.203: icmp_seq=232 ttl=128 time=1.25 ms 64 bytes from 192.168.2.203: icmp_seq=233 ttl=128 time=1.64 ms
This shows two interesting features:
1. The lengthened response time is often, but not always, almost exactly 1 second.
2. The response time to the ping immediately following the long one is always approximately double the normal response time.
Taken together, this evidence strongly suggests to me that sometimes a packet is received, but TCPNet is not informed of this until the subsequent packet arrives, and then both are processed together, thankfully in the order in which they arrived.
CH ==
Thanks Christopher.
I am using the V3.05 driver but my PING response times are much greater :- 64 bytes from dev_00 (10.51.21.48): icmp_seq=35 ttl=128 time=0.734 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=36 ttl=128 time=0.780 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=37 ttl=128 time=0.759 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=38 ttl=128 time=0.747 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=39 ttl=128 time=0.771 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=40 ttl=128 time=200.714 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=41 ttl=128 time=44.451 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=42 ttl=128 time=266.451 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=43 ttl=128 time=162.466 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=44 ttl=128 time=195.019 ms 64 bytes from dev_00 (10.51.21.48): icmp_seq=45 ttl=128 time=214.114 ms
It would appear the PING response time jumps whenever my TCP socket connects. If I disable TELNET in the NETCONFIG.C file and do not make a call to TCP_LISTEN then the PING responses are of the order of 0.7mS. If TELNET is enabled or I make a call to TCP_LISTEN then the PING response times jump up once the TCP connection is made.
As an aside, I have also noticed the PING response times increase (with no TCP socket connection or TELNET) if you exit the debugger i.e., build your code, download to flash via the debugger and then exit the debugger.
I have raised these issues with Keil and will let you know their findings whenever they get back to me.
Maybe that is expected behaviour (lengthed ping response times while there is TCP activity).
Remember that TCPNet is single-threaded, so while a TCP packet is being processed (i.e. while in the TCP socket callback), any other incoming packets (including incoming pings) are buffered, and processed once the TCP callback has completed (or maybe the next time you call main_TcpNet() - I am not sure).
This is in contrast to bigger systems where typically separate threads/processes respond to each TCP/UDP socket. In this case even if one thread takes a long time to respond to a packet on a given socket, the pings are handled by a separate thread and so respond immediately.
In my case I have no IP activity except the pings themselves.
CH
Hi Des, while investigating on this issue for the LPC2366 we noticed a possible problem in lpc23_emac.c (at least the version included in RL-ARM 3.10).
It's isr interrupt_ethernet() appears to ignore the case of multiple frames already fetched in the rx buffer. IMHO there should be a loop checking for consume and produce indexes to pop all the available frames from the buffer (this could be the case if one or more frames get received before the isr gets triggered). It should be something like this (note the additional loop):
static void interrupt_ethernet (void) __irq { /* EMAC Ethernet Controller Interrupt function. */ OS_FRAME *frame; U32 idx,int_stat,RxLen,info; U32 *sp,*dp; while ((int_stat = (MAC_INTSTATUS & MAC_INTENABLE)) != 0) { MAC_INTCLEAR = int_stat; if (int_stat & INT_RX_DONE) { while (MAC_RXCONSUMEINDEX != MAC_RXPRODUCEINDEX) { <-- additional loop /* Packet received, check if packet is valid. */ idx = MAC_RXCONSUMEINDEX; info = Rx_Stat[idx].Info; if (!(info & RINFO_LAST_FLAG)) { goto rel; } RxLen = (info & RINFO_SIZE) - 3; if (RxLen > ETH_MTU || (info & RINFO_ERR_MASK)) { /* Invalid frame, ignore it and free buffer. */ goto rel; } /* Flag 0x80000000 to skip sys_error() call when out of memory. */ frame = alloc_mem (RxLen | 0x80000000); /* if 'alloc_mem()' has failed, ignore this packet. */ if (frame != NULL) { dp = (U32 *)&frame->data[0]; sp = (U32 *)Rx_Desc[idx].Packet; for (RxLen = (RxLen + 3) >> 2; RxLen; RxLen--) { *dp++ = *sp++; } put_in_queue (frame); } rel: if (++idx == NUM_RX_FRAG) idx = 0; /* Release frames from EMAC buffer. */ MAC_RXCONSUMEINDEX = idx; } } if (int_stat & INT_TX_DONE) { /* Frame transmit completed. */ } } /* Acknowledge the interrupt. */ VICVectAddr = 0; }
This change was suggested by a different implementation of the driver found in the example code boundle for the lpc2300 from NXP (www.standardics.nxp.com/.../code.bundle.lpc23xx.lpc24xx.uvision.zip)
We are still digging into the LPC2300 user manual to fully understand all the magic behind its internal Ethernet controller, but our preliminary patch to the driver shows promising results.
Maybe this is the same fix that Christopher Hicks reported to improve ethernet performance in its setup, after upgrading to the latest driver source. If this is the case, since he's using a different CPU (ARM9) fixes migth have not been propagated to the lpc23_arm.c. I had no time yet to verify this, so, please, confirm.
Please let us know if we are on the right way to solve the issue and/or any other change this could suggest for the mac driver.