This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strategy for handling sys_error() when using blocking BSD sockets

Prior to using the relatively new BSD socket API in TCPNet, all socket related activity (such as the TCPNet's own HTTP server) in our application would of course be performed in a single task. As a precaution, the TCPNet sys_error() function has been modified so that in the event of a critical error code, it calls os_tsk_delete_self() to terminate the networking task. Immediately prior to that happening, a flag is set to inform the main application task that the main networking task needs to be started again, which of course restarts the TCP/IP stack from fresh. This is very simple to implement for the one TCP/IP task, and it enables the box to come back online again in the event of the stack throwing a wobbly and entering sys_error(). Though a sys_error() call should never, ever happen with careful design, it has been seen to happen once in a blue moon and I would prefer to have a system that can deal with it gracefully, rather than just spinning in the while(1) at the end.

It's not so simple in the case of using multiple tasks that use blocking BSD sockets. Such a task might spend much of its time blocking in bsd_suspend(). If the main networking task dies and needs to restart and TCPNet is reset, BSD worker tasks that were blocking on a socket call at the time are potentially left forever dangling in an inconsistent state. The solution I have implemented so far is to maintain a list of IDs of all running tasks that touch the TCP/IP stack (including all worker tasks that use BSD sockets). In the event of a call to sys_error(), I rattle through that list and kill all of the referenced tasks. Then, the main networking task and TCP/IP stack are restarted. This strategy seems to work so far, and requires that all tasks that use a BSD socket register themselves on startup and unregister themselves prior to self-terminating.

I just wondered if anyone had any other suggestions about how to go about this. For example, we wondered if it's possible to obtain a list of current task IDs from the RTX itself, rather than implementing our own register / unregister process. Someone asked about this before, here: http://www.keil.com/forum/20013/. The basic answer unfortunately seems to be 'no'. I suppose even if it were possible, a mechanism would still be needed to determine which of those threads touch TCPNet.

It is important that the system can cope with the TCP/IP stack throwing a wobbly without resorting to a complete restart of the box.

Any thoughts and ideas appreciated.