Hi everyone, we found out that some RL-RTX routines, like _free_box(), and potentially some others, silently re-enable global interrupts. To me it is clearly a bug that becomes obvious if you try calling such a routine from within a critical section. All of the sudden the interrupts get re-enabled on the exit from this function.
When we reported the problem, the Keil's developers responded with the following statement:
"_free_box() function is thread safe and can be called from main or irq mode. Because the SVC mode in Cortex-M can be interrupted by a higher priority interrupt, we protect the critical part of code with interrupt disable and enable.
The customer shall disable only his peripheral interrupts not a global interrupt"
I can't agree with it - as much as RTOS needs to protect its critical part, we need to do the same in our applications. The status of global interrupts must be saved before disabling them, and restored upon exit. Has anyone else found any issues with global interrupts in RL-RTX function calls?
Thanks, Sergey
I thought it was good practice for a library routine to save the previous interrupt state instead of assuming that the library owns it.
I've just received confirmation from Keil tech support that _calloc_box() and _alloc_box() do the same - enable interrupts on exit because they are "thread-safe". I only wish they mentioned this "side effect" in the manual.
I think Keil will just have to do a lot of manual updating in a hurry. Forgetting this little "minor" side effect is not really acceptable.
For an interrupt that happens very seldom, it may be very, very hard to catch this kind of error in production code. Without the documentation, the users would have to either assume that all api has this behaviour, or have to explicitly try to test the behaviour by disabling interrupts before the call and check the interrupt state upon return.
For an interrupt that happens very seldom, it may be very, very hard to catch this kind of error in production code.
on the verge of impossible, I think. clients are likely to treat this as a "glitch", and reset the module, maybe not even reporting the problem and even if they do, it is not very likely that the problem would be reproduced and even if it is, the logic and possible side effects might made it just incredibly hard to figure out what went wrong. A disappointing feature - I did not expect this.
I had always assumed this was a well know issue because it has been that way since ARTX. Personally I have modified the RTX to handle nested tsk_lock(). This can all be done at the user level in RTX_Config.c
(Unfortunately I am still using 3.24 of RTX because I like to be able to make nested calls to tsk_lock(). Since I am make about a dozen releases of firmware within the next few months, I an not going to change the version of my OS until I have a month or 2 without a new release.)
From a code review, I seem to not be affected by this bug/misfeature/documentation oops, but the documentation really has to mention this.
This also shows the need for having 100% access to all source in important embedded projects. It isn't possible to do a satisfactory code review if the code calls undocumented black-box functions where we have to rely on just trust/hope. I hope Keil will reconsider their view on supplying the network stack as a library.
Robert, if I understand you correctly, you modified tsk_lock(), but it doesn't really solve the problem with global interrupts. Say, in our case I have custom memory management routines that might be called from an ISR. Which means that those routines must be interrupt-safe, not only thread-safe. How do you handle this situation?