This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

osMessageGet causes HardFault when IRQ disabled

Hello All, the following code is generating a HardFault. I'm sure I've missed something stupid:

void Thread_Debug2(void const *argument)
{
        // locals
        Debug_Struct * pDbgStrct = 0;
        osStatus Status = osOK;
        uint32_t Placements2 = 0;

        // Start
        while(1)
        {
                // disable irq
                __disable_irq();

                // get message, here the hardfault occurs
                pDbgStrct = (Debug_Struct *)osMessageGet(MsgQueue_Debug1, 0).value.v;

                // enable irq
                __enable_irq();

                // free memory
                Status = osPoolFree(MemPool_Debug1, pDbgStrct);

                // check status
                if(Status == osOK)
                {
                        Placements2++;
                }
        }
}

In my opinion, the osMessageGet should be able to get a message from a message queue even when no interrupt is activated...

  • Why do you disable the interrupts - because you think that osMessageGet() isn't thread-safe? What if osMessageGet() itself makes use of some interesting critical section?

    I don't very much like:

    pDbgStrct = (Debug_Struct *)osMessageGet(MsgQueue_Debug1, 0).value.v;
    


    Isn't it good practice to validate the return value before you start making use of it? Shouldn't you always look at the status first?

    www.keil.com/.../group___c_m_s_i_s___r_t_o_s___message.html

    Next thing is if your queue is valid.

  • Yes, you may not make OS calls from thread level with interrupts disabled. The OS is always called through the SVC mechanism while in thread mode and if you have interrupts disabled this will always hardfault (you already know this).

    As far as being stupid, no it is not unreasonable to think that this might be possible, but if you study the OS Code, you would find out why this is not a good idea at all. Since all OS function calls are thread safe, I would hope that you would not need to disable interrupts in the way you did. IF you really need the special case you show (and yes, this is not something that people would normally try to do) you need to do something different. The main advantage to using the os functions is that you can "wait" for things to happen while not prohibiting another task to run. In the example you show, you are essentially polling for data instead of "politely" waiting for data. Polling is often an easy way to accomplish things and verifiable as correct. Of course in your example you are not only disabling interrupts and polling, there is never a time that your task ever blocks to allow other tasks to run. Also it is ALWAYS freeing a message even if it did not receive one.

    My guess is that Thread_Debug2 should not disable interrupts. It should do an osMessageGet() with an infinite timeout. When it gets a message, IF you really do need to disable interrupts to process the message, you can disable them - but so far it does not look like it is needed. Remember, until your task frees the message, no one else can use it AND both the osMessageGet() and the Free() are thread / IRQ safe.

    one difficulty to overcome working while interrupts disabled would be -

    (Task switching may need to be initiated from thread mode, which requires a switch to Handler Mode, which cannot happen if interrupts are disabled)

  • Wouldn't the Hard Fault, and the processor registers point you at exactly what it's objecting too?

    If you understand what's faulting, it's a lot easier to determine why it's faulting.

    Are you blowing out the stack?

  • Hello Again,

    Thanks to all of your replies. I've updated the thread/function:

    void Thread_Debug2(void const *argument)
    {
            // locals
            Debug_Struct * pDbgStrct = 0;
            osEvent evt;
            uint32_t i = 0;
            volatile uint32_t Dummy = 0;
    
            // Start
            while(1)
            {
                    // check for message
                    evt = osMessageGet(MsgQueue_Debug1, 0);
    
                    // message received?
                    if(evt.status == osEventMessage)
                    {
                            // message received, get data
                            pDbgStrct = (Debug_Struct *)evt.value.p;
    
                            // do something with the data
                            Dummy = pDbgStrct->Temp1 + pDbgStrct->Temp2 + pDbgStrct->Temp3;
    
                            // free the memory block
                            osPoolFree(MemPool_Debug1, pDbgStrct);
                    }
    
                    // do some other things (no os delay)
                    for(i = 0; i < 123456;i++)
                    {
                            __NOP();
                    }
            }
    }
    

    You might noticed, that I've posted another Question the other day about a mailbox overflow

    http://www.keil.com/forum/60328

    This is my main problem so I've started to search otherwise and found the problem above. I thought, that this might be some reason of my problem, but no it isn't...

    Has anyone a Idea what can go wrong when receiving messages like above? Down here is the function where I put the messages:

    void Thread_Debug1(void const *argument)
    {
            // locals
            Debug_Struct * pDbgStrct = 0;
            osStatus Status = osOK;
            uint32_t i = 0;
    
            // Start
            while(1)
            {
                    // allocate memory
                    pDbgStrct = (Debug_Struct *)osPoolCAlloc(MemPool_Debug1);
    
                    // check for memory block
                    if(pDbgStrct != 0)
                    {
                            // fill structure
                            pDbgStrct->Temp1 = 0xAB;
                            pDbgStrct->Temp2 = 0xCDEF;
                            pDbgStrct->Temp3 = 0x01234567;
    
                            // put it in queue
                            Status = osMessagePut(MsgQueue_Debug1, (uint32_t)pDbgStrct, 0);
    
                            // check status
                            if(Status == osOK)
                            {
                                    // message successful placed
                            }
                            else
                            {
                                    // message not placed in queue, free memory
                                    osPoolFree(MemPool_Debug1, pDbgStrct);
                            }
                    }
    
                    // do some other things (no os delay)
                    for(i = 0; i < 123456;i++)
                    {
                            __NOP();
                    }
            }
    }
    

  • If you are looking for problems with the message queue, then you really should make sure you fully verify return codes.

    The function can return:

    osOK: no message is available in the queue and no timeout was specified.
    osEventTimeout: no message has arrived during the given timeout period.
    osEventMessage: message received, value.p contains the pointer to message.
    osErrorParameter: a parameter is invalid or outside of a permitted range.
    

    You only check if "osEventMessage" or "not osEventMessage".
    Correct would be to check for "osOK" or "osEventMessage" - the two expected in your design - and notify you with some error message if you get anything else. Your code will not tell you anything if you get "osErrorParameter".

    Next thing - in what way do you hope to test your code by having polled retrieval, and then do:

    // do some other things (no os delay)
    i = 0; i < 123456;i++) {
        __NOP();
    }
    


    The above sequence is a busy-loop. And it will starve any thread with lower priority. It isn't even nice with other threads of same priority.

    Notice that both your sender and your receiver wants to waste all the CPU capacity they can lay their hands on in your busy loops. That's not cooperative programming. Both threads are directly hostile to anything but themselves.

    The goal with threads, is to have each thread perform as little work as possible before flagging to the OS that the processor capacity should be donated to something else. So a thread should only consume CPU if it has actual work that needs to be done. And it should perform this work at a priority that is reasonable compared to all other work your program may need to perform.

    You can get lots of problems with your message queues if you don't write your program in a way that all consumers are given a chance to retrieve data at least as fast as the producers are able to insert data.

    If you really do want to debug, you can make use of a separate counter where the producer increments the counter when it has added a message - and the consumer(s) decrements the counter when having retrieved a message. Possibly yet another counter for the allocation/deallocation of the actual message data. This obviously requires that the counters are volatile - and might require the use of critical sections since few processors has atomic increment/decrement of memory variables. But any code just has to verify the return codes of all function codes and appropriate "else" sections if you get a "bad" status code. You can't just code for "happy path". Code for the worst, but hope for the best.

    Your code has no way to report if you fail to allocate a message etc.

    One of the first design decisions when starting a project should be to decide how your program can inform you about deviations from "happy path". You can spend huge amounts of time debugging strange problems, if you haven't engineered in error detection and reporting in your program. Programming is both engineering and creativity. While the creative part might be fluffy and abstract, the engineering part is rule-based and a required rule is to care about errors. Best practices are called best practices because "best" is very good. And it's very good to do what helps you and at the same time helps your customers. Short-cuts saving your a couple of lines of code are seldom ending up being time savers.

  • Hello, it's me again.

    Thanks for your detailed reply.

    About the "Happy Path" ;-) : In this case it is only the osEventMessage status I need, in any other case I have to do something else so that's why I've made this. I totally agree with you, that you should check for every status and react on it. (I'll use a switch to determine the problem).

    The polling retrieval: I just want to simulate there some calculations who took cpu time. To understand: in this Thread I want to poll for a message from the queue and do other things. I don't create a whole thread just to wait for messages.

    Anyway, I think I've found my main problem about the MailboxOverflow...

    Try something below with a RTX Timer Callback Queue Size of 1

    Timers defined and created

    void Thread_Debug3(void const *argument)
    {
            // Locals
            volatile uint32_t Dummy = 0;
            osStatus  Status;
    
            // reset variables
            a = 0;
            b = 0;
    
            // 1s timer
            Status = osTimerStart (tid_Timer_1_ms, 1); // 1ms periodic timer
            if (Status != osOK)
            {
                    return; // Timer could not be started
            }
            Status = osTimerStart (tid_Timer_2_ms, 1); // 1ms periodic timer
            if (Status != osOK)
            {
                    return; // Timer could not be started
            }
            Status = osTimerStart (tid_Timer_3_ms, 1); // 1ms periodic timer
            if (Status != osOK)
            {
                    return; // Timer could not be started
            }
            Status = osTimerStart (tid_Timer_4_ms, 1); // 1ms periodic timer
            if (Status != osOK)
            {
                    return; // Timer could not be started
            }
    
            // Start
            while(1)
            {
                    osDelay(10);
    
                    Dummy = a + b;
                    a = 0;
                    b = 0;
            }
    }
    

    Guess what happens: MailboxOverflow!

    But WHY is the big Question. Normally I would assume that the osError will be OS_ERROR_TIMER_OVF instead of OS_ERROR_MBX_OVF

    Because of 4 Timers who are all ready on the same time will fill up the timer callback queue, but there has to be a OS_ERROR_TIMER_OVF and not the MailboxOverflow! At least when reading the comment

    case OS_ERROR_TIMER_OVF:
                            /* User Timer Callback Queue overflow detected. */
    

    So, is there a RTX problem or is something wrong with my code again?

  • "I totally agree with you, that you should check for every status and react on it. (I'll use a switch to determine the problem)."

    There is seldom a need to handle every possible error code you may get from a function call. It's enough to handle all "ok" status codes and any "error" status codes that you have work-arounds/retries/... for. Then you can have a wild-card else/default statement for any additional status code. But this code must be able to either log the problem and fully reset the device. Or log the error (time+where+status), release any allocated resources and then continue with the execution.

    The main thing is just:
    - what is happy path
    - what are expected errors that you can you recover from
    - what can you not recover from - and do you then automatically reset the device? Or intentionally lock up the device until a user performs some reset operation?
    - how to report

    And anything that isn't happy path or expected errors should be seen as unrecoverable errors without need to subdivide individual error codes further (but you want the error code logged). The only thing then is if the unexpected error should actually have been expected. Or if it happened because of corruption somewhere else.

  • Well in my case (speaking of the mailbox overflow) I think that it's really a RTX Problem. Are you able to reproduce the error?

    I mean you setup 4 Timers, having a Timer Queue length of 1, normally you would expect a Timer Queue Overflow, but that doesn't happen. What happens is a Mailbox Overflow and then you search all your mailboxes about some alloc/put/get errors and handle them (handle every return values too) but find nothing...

    And then with some luck, you find the faulty line in the code and it isn't in your code?

    How can you react to something that should handled in the RTX librarys, or at least, giving you the correct error case.

    Anyway, thanks to all replies to solve this problem!