This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

RTX and watchdog monitoring

I wondering what method you're using to monitor the sanity state of the RTOS and activate a watchdog in case of failure.

Currently, I have a hardware watchdog feed in the lower priority task . But in some case it's not working :
The RTX can be OK but one task can be stuck , or buffers cannot be allocated or others reasons.

According to you , what is the best method ?

thanks in advance.

  • what you can do it kick your watchdog only if each and every task has set a dedicated bit to 1. if not, the system resets:

    #define SERVICE_WATCHDOG                                lock_mutex(APPLICATION_WATCHDOG_FLAG_DATA) ;\ 
                                                                                    g_application_alive_signals |= 1<<(os_tsk_self() - g_first_task_id) ;\ 
                                                                                    unlock_mutex(APPLICATION_WATCHDOG_FLAG_DATA) ;
    

    and

    
    __task void user_task(void)
    {
        for (;;)
        {
            SERVICE_WATCHDOG
        }
    }
    

  • I have seen something like that.
    But for infinite waiting event task ? How to manage it ?

  • Let the wait function wakeup on a timeout and service your watchdog!

  • I prefer a timeout, to let the task sign off that it is still alive. The task may then also accumulate time since last event, and stop kicking if too many iterations to the wait function without detecting any event.

    The task/ISR that is sending events may also update a counter each time it does. If the listener notices the counter incrementing while just getting timeouts, then it knows something is wrong and can stop kicking.

    The task receiving events may also increment a counter for each processed event, and the sender may check that the backlog doesn't constantly grows.

  • But it may be correct to constantly have zero blocks free. One example is when you have an LRU cache. The cache will fill until all entries are used.

    After becoming full, new allocations will require a concurrent block release. A thread that looks at the number of free blocks will never even see the LRU block swap, since the LRU code most probably don't release any block - it just overwrites the old contents and relinks the block. And even if there is a quick release followed by an allocation, a thread polling may never catch that tiny interval when one block is free.

    So if you want code to keep track of block allocations, you may have to keep track of threads that constantly fail to allocate instead of relying on the current number of allocated blocks. It isn't the number of used blocks that is a problem, but a thread being starved of a required resource.