This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Inducing RTX failure

Hello all,

I do not wish to repeat myself as I have addressed this issue in a recent thread, but this is too important to turn a blind eye to as even lives could be at stake which certainly makes it worth a separate thread: I believe I managed to conceive a program that causes a failure of RTX on a LPC2468/2478 (simulator does not induce the failure). Quite some people have reported problems with RTX on this forum, so hopefully they can download my stripped test program here dl.getdropbox.com/.../LPC2468_RTX_Demo_min.zip to try there own variants. I have of course informed Keil support about this issue and I am currently waiting for feedback. I would very much appreciate any feedback you might have.

Tamir

Parents Reply Children
  • It sounds like you guys are making great progress. Thank you for continuing to post information as you go. I appreciate it and I am sure others on the forum do as well.

    -Eric

  • The actual reason for all the sporadic occasional RTX failures you have been seeing is most likely due to the NXP LPC2xxx VIC undocumented "feature" (described bellow) and that RTX was not aware of this.

    VIC behavior: After an interrupt is disabled (writing to VICIntEnClr) the interrupt is not immediately blocked but can still happen for a few cycles (time needed for VIC to process the request). Special tests were performed which confirm this behavior.

    This "feature" was not taken into account by the RTX kernel. Therefore in some rare situations (very timing specific) it could happen that a blocked interrupt was still executed which eventually lead to RTX failure. Such situations are very rare (can happen sooner when the system time tick interrupt happens more often) and even less likely when the MAM is disabled because then an instruction fetch takes longer then the few cycles that VIC requires. This explains also why the problem was not detected sooner and why it was almost gone when MAM was disabled.

    The updated RTX kernel now takes the described VIC behavior into account which should eliminate the reported problems (at the cost of a few additional CPU cycles).

    BTW: Similar Interrupt controller behavior like described for the NXP VIC applies also for the ST's STR7 EIC. In reality the EIC is even worse in this aspect since the time to process the interrupts is even longer. Therefore this behavior was already seen and RTX kernel already handled this. On the other hand it was considered that for NXP VIC this is not necessary.

    In general ARM7/9 cores do not have interrupt controllers so silicon vendors added their own external implementation and this leads to such behavior as described above. Much better in this aspect are the new ARM Cortex-M cores which have an advanced Nested Interrupt Controller (NVIC) already tightly integrated with the core. This has many benefits (faster interrupt response, late arriving interrupts, tail chaining ...) and also eliminates such problems as seen with VIC and EIC.

  • Hello Robert,

    Franc provided me with a patch that seems to work fine. I guess we need to thank you all for putting so much effort in this. When can we expect a new offical release of RL-ARM containing this fix?

    Tamir

  • Is it really undocumented? Isn't that behaviour common to all ARM chips that have an external interrupt controller, and one of the reasons why code either has to wait a fixed number of cycles or deactivate interrupts in the core instead of in the interrupt controller?

  • Tamir,

    The new RL-ARM which includes this fix will be released soon (in a few weeks).

    Per,

    Yes, this behavior seems to be common to ARM7/9 with external interrupt controllers. However the number of cycles varies between interrupt controller implementations and I haven't seen any documentation about this.

  • "However the number of cycles varies between interrupt controller implementations and I haven't seen any documentation about this."
    Neither have I. And it isn't easy to guestimate the required number either. Some thing that seems to work after extensive testing can still be one clock off, just waiting for that other interrupt to come and catch you with the pants down :(

  • Hi Robert, Thank you very much for your explain. My arm7 uses EIC and I encounter a simular issue, see here :http://www.keil.com/forum/docs/thread15796.asp

    To fix the problem, our solution is to excute a short for loop(to delay) after disable EIC interrupt.Dose the method work?

  • A number of chip vendors have recommended the use of a couple of nop after disabling interrupts. Any combination of instructions that takes - at least - the required number of cycles should do fine.

    The only issue is that the exact number of cycles isn't always known because the manufacturer haven't published it in any datasheet.