This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Hard fault in os_mbx_wait (rt_mbx_wait) with RTX

Hi,

We are using RTX with STM32 cortex M3.

We have a task crashing due to hard fault, in the os_mbx_wait function.
We investigated the hard fault including compiling the RTX source code in the project. We found the program crash in the function rt_mbx_wait:

extern OS_RESULT rt_mbx_wait (OS_ID mail_box, void **message, U16 timeout);

Further investigation is showing that when os_mbx_wait is called and the service call is made, the stack contains the correct values, i.e. the arguments to the function : a global mailbox and a timeout value.

When we have the hard fault, we can see that the rt_mbx_wait gets executed, the message pointer and the timeout value are correct, but the mailbox value is no longer valid - its has changed from a global mailbox to 0x00000001 (always).

After we checked the trace data , of a correct running trace, and a hard fault trace, and we found that:
In the correct running we see in the trace, we get:
os_mbx_wait
rt_mbx_wait
os_mbx_wait
rt_mbx_wait
os_mbx_wait
rt_mbx_wait
......

When we got the hard fault, we get:
os_mbx_wait
rt_mbx_wait
os_mbx_wait
rt_mbx_wait
rt_mbx_wait - !!!!!!!!!!

we can see that the rt_mbx_wait was called without a os_mbx_call.

other helping information about our project:
1. we got the error, only when using mailbox timeout (20 msec) and not in tasks that using infinite timeout.
2. the hard fault is very rare. it happened only after hundreds and thousands calls.
3. we think that it has something with the context-switch. we have several tasks in out project, and when we lower the task priority below other massive work tasks, we didn't got the error.
4. we found a similar case without a solution here: http://www.keil.com/forum/18639/

Any thoughts?

Thanks,

Parents Reply Children
  • Farther information when digging the error:

    1. We are in Os_mbx_wait with 2 sys_ticks timeout.
    2. first tick, everything is ok, we get into the os_systick handler and from there to task switch, and exiting to idle task cause no task needed to be execute.
    3. in the secound tick we get into the os_systick handler and from there to task switch, and need to execute the task that pends.
    3. in the task switch, instead of returning to the task code, we return to the rt_mbx_wait code.
    4. then we get an hard fault.

  • Another thing,
    when we go out from the task switch and need to return to thread mode (the task), we immediately get a svc_handler call.
    then the call go to the rt_mbx_wait and we get a hard fault.

  • Check the priority levels for SVC, PendSV and SysTick interrupts. Expected values are 14 for SVC and 15 for SysTick and PendSV. The STM32 uses 4-bits for the priority levels.

  • Franc,

    Is there an indication in the user manual regarding the desired priorities?

  • The RTX kernel assigns the interrupt priorities in os_sys_init() automatically as described here: www.keil.com/.../rlarm_ar_hints_cortex.htm.

    However the symptoms described in one of previous posts indicate, that SVC instruction is being interrupted at the early stage of execution. This however is dangerous and causes system crash.

  • So, update:
    We checked the priorities, and we saw that all SVC, PendSV and SysTick has the priority of 15.
    This come from the RTX kernel. the function rt_svc_init in the file rt_HAL_CM.h gave SVN the priority of 15 (weird ?!, BUG ?).

    So, we changed it manually to 14 by writing 0xE0000000 to NVIC_SYS_PRI2, and checked the priority in the debugger:
    System service call (SVC) - 14
    Pend System service (PENDSV) - 15
    System Tick Timer (SYSTICK) -15

    We tested the change overnight, and we still got the error....

    I goes like this:

    We are in a system running some tasks.
    We have one task in os_mbx_wait with 2 ticks timeout.

    After 2 ticks the Systick_Handler call to switch_task.
    the last line is: BX LR (Return to Thread Mode).
    Instead if returning to the task , to the next line after the os_mbx_wait,
    We get to SVC_HANDLER (Maybe an interrupt accured ??? it doesnt show me nothing it the debug trace) and the function rt_mbx_wait is call again ! (this causes a hard fault, because the function is being called with a mailbox pointer of 0x00000001).

  • "Once you go black, you never wanna go back".

    Paraphrased:

    "Once you go FreeRTOS, you never wanna go back".

  • OK, we found a possible explanation.

    As mentioned before we are using priority group 0 (NVIC_PriorityGroup_0).
    It turn out that NVIC_PriorityGroup_0 PRIGROUP value is 7.

    In the documentation you gave me it said:
    "Allowed values for PRIGROUP are from 0 to 6. The PRIGROUP value 7 will cause RTX to fail"

    So..... we think we are using the wrong priority group, and changing it to NVIC_PriorityGroup_1 will prevent the Hard Fault.

    Thanks,

    We will update the results.

  • Well, good news.

    It worked.

    Thanks to all readers and helpers.