When a high priority task performs an os_dly_wait for 1 tick followed by a low priority task performing an os_dly_wait (one tick also) all tasks stop running and the ARM eventually executes a watchdog reset. I can't believe such a simple OS operation could fail like this. Anyone ever seen such a thing?
i not be seen this
you be posting minimel code please yes?
I have never seen os_dly_wait() behave improperly when my code is behaving properly. Are you SURE you are not doing something other than specified? (you are, in that the watchdog is resetting the processor but you did not specify how you are hitting the watch dog)
The watchdog is kicked by the highest priority task but only 1) if the watchdog task is running, of course and 2) if every other task is incrementing it's assigned counter indicating it's 'alive'. There are 6 tasks which wait for an event or mailbox delivery and each also specifies a timeout. On timeout each task increments its watchdog counter. All tasks but two are quiescent (just experiencing timeouts). Task 1 is busy performing I/O which requires delays for completion (therefore the os_dly_wait there). Task 2 tries to alloc a box to be used for a mailbox entry to send to task 1. There is no memory available so task 2 performs an os_dly_wait and try again. The system locks up. If task 1 does not perform an os_dly_wait and simply loops waiting for a hardware timer to count up, no problem. When task 1 completes it releases the memory box and waits for another mailbox delivery. Task 2 wakes up, is now able to get the memory, delivers the mail and all is well.
That made it clear enough for me to know your software is not correct. Look there first and not at the os_dly_wait().
Thanks for replying. Minimal code would be hard to provide. I plan to build a new project with a simple program where there are two tasks; task 1 has a higher priority than task 2 with both performing os_dly_waits and try to see if I can make it fail. I have provided more information on my application later in this thread. Thanks
Thanks but could you elaborate on why you think my software is not correct? I'm going to try to build a short, simple project with just 2 tasks doing operations like I describe. Hope that sheds some light on what interactions/system resources might be contributing to this lock up. If you have any suggestions, please let me know.
Thanks again.
Post the code. The 2 tasks seem quite simple. I will do my best to provide pointers.
In case you're interested; after many hours of debugging this is my final assesment of the RTX OS problem:
1. When all my tasks are in a wait condition (either waiting on an event, waiting on a mailbox post or in an os_dly_wait state) the os_idle_demon OS idle task runs. Due to some combination of events or task states os_idle_demon can be started by the OS with global interrupts disabled. I placed code in os_idle_demon to read the CPSR and check for the IRQ bit. I found it SET (interrupts disabled). If I then clear it, the system runs without any lock up.
2. If I remove the CPSR check in os_idle_demon and instead add a new task (lowest priority) to my application which behaves like os_idle_demon (an empty 'for' loop) this idle task gets control instead of os_idle_demon when all my other tasks are in a wait state. And I find in this task that global interrupts are never disabled as they were in os_idle_demon.
Hope this helps someone else sometime.
My guess is that interrupts are disabled when you call os_sys_init(). The idle_demon will inherit this state. Make sure the interrupts are enabled when you call os_sys_init(), or if this is not possible enable interrupts as the first statement before the endless for loop for the idle task.
I am also guessing that you are running 3.40 and not the earlier OS...
Hello Mark, I am quite interested in the potential issue you found and I would like to know whether there is any development on this topic. Infact, if demonstrated, this issue coud justify a weird misbehaviour in my application which leads to a watchdog reset as well. Were you able to build a small project which shows the issue ?
Thanks.
Regards,
Sebastiano
The problem turned out to be mine. In the process of porting a legacy application to RTX I had to start the application in SYS mode with interrupts disabled. Once I started all my tasks, they each adopted the RTX 'initial CPSR' which had interrupts enabled. Everything ran very well until all my tasks were blocked and the OS's idle task (os_idle_demon) was passed control. When this task runs, it uses the CPSR which existed when os_sys_init was called and, of course, interrupts were disabled. So os_idle_demon is now running in a loop with no chance of the RTX clock tick relieving it of duty. Therefore the watchdog timeout.
Hope this is of some use to you.
View all questions in Keil forum