I have an identical problem to the ETH_IRQHandler() interrupt bug described on the original thread with this title (i.e. TCPNet / Ethernet - Preventing 'interrupt storm')
i.e. OS_ERR_FIFO_OVF gets raised, and the network crashes. I was hoping to monitor the TCP socket traffic in order to get an idea of why the network crash occurs and raised a separate thread "ARM: Definitive guide to using Net_Debug", but unfortunately didn't receive any feedback on that.
I've implemented the fix described in the original thread (i.e. swap ETH->DMASR = INT_RBUIE; for ETH->DMASR = (INT_AISE | INT_RBUIE);) and the system appears to be holding up so far. But I'd really like to know the nature of this bug. What are the circumstances that cause this system crash ? What socket traffic causes the TCPNet to hang ?
It would be really nice to get definitive closure on this issue. For everyone's benefit.
I did reply in your other thread, but you didn't follow up.
Now, regarding the interrupt storm issue, I have had the same problem with Keil's ethernet driver for the NXP LPC23xx family (which is ARM7, not Cortex-M). The problem is that within the ethernet interrupt function, there is a loop that tries to exhaust (i.e read all received fragments) the RX-descriptors. If the ingress packet flow is high enough (try a DOS-attack), this loop will never end (because there will always be a new fragment to process), and the interrupt function never returns. This will of course kill the system, since nothing else is permitted to run.
I have no experience with the CPU you are using, but it might be something similar that is happening.
I "fixed" the problem by breaking the loop and disabling ethernet interrupts for a while if the packet flow was too high to handle. Packets will of course be lost, but it is better than entering a boot loop (due to the watchdog barking).
Best regards -Øyvind
Hi Øyvind,
Thank you for your response. I note that you did indeed respond to my "Net_Debug" thread, and yesterday morning I replied and thanked you for your input. Unfortunately my post seems to have got lost (probably due to and "interrupt storm" :0) ). However, I *still* don't know how to use Net_Debug (compiling the TCPD_CM3.lib library in is the easy bit (but how do I tell TCPD_CM3.lib which USART to use and which putc() to register, etc ?).
I can see from your description that this is indeed the case with the ethernet interrupt. I was just wondering what type of ethernet traffic scenarios might cause this. Some examples might be nice to consider. For what it's worth I think Keil need to make this bug (?) fix well published. My search skills are quite limited. But searching on "OS_ERR_FIFO_OVF" revealed no connection on cause and effect for this situation.
Any how, the new build is still standing, and the old build has has crashed several times. Let's keep that going for a couple more days and I'll be happy.
Best, Richard.
Traffic that could cause this would be any type of broadcast or multicast traffic with a high enough volume/rate. And of course unicast traffic directed at the unit (like with a DOS-attack). To experiment, it would be perfectly valid to use the well known hacker tool LOIC to spam your unit with traffic. Just ensure that you're not going through a switch/router with protection against such traffic, or else you might think your unit handled the traffic, when in fact the switch/router acted as a protection.
Another possible culprit could be if your unit was connected to a LAN which also carried multicast-traffic (for instance IPTV), and the switches in the network does not use IGMP-snooping to avoid relaying the traffic to all their ports.
Thanks Øyvind, you've been a great help.
Needless to say but we do have lots of multicast traffic whizzing around our LAN. We're going to have a play and launch a DOS-attack of our own.
In MDK5 driver, this problem is solved. The concept of processing Rx data is changed and this prevents Ethernet interrupt storm problem.
The interrupt handler does nothing more than sends a notification to ethernet thread in the library. This action is fast and takes only a few us. Then the thread, which runs in high priority reads the frame from DMA and releases DMA block.
If packets are coming too fast, ethernet receive DMA fills all available DMA blocks, and Rx DMA is stalled. In this state, Rx DMA does not generate further Rx interrupts. All Rx packets are dumped by ethernet controller without CPU intervention. Receiving process resumes, when the ethernet thread reads a packet from Rx DMA block and releases the block back to ETH-DMA.
Hi Franc, thanks for the feedback, that's useful to know.
Øyvind, thanks for your support. As suggested we tried our own DOS attack (with the interrupt fix in place) and everything stood up fine. We have also been soaking our system over the weekend and it's error free.