Hello,
I am having an issue with RL-ARM RTX where I get a data abort in the os_get_first function.
The reason that I get the data abort is that my os_rdy table has had its p_lnk pointer loaded with an invalid address--it appears that somehow a non-existent task has worked its way on to my os_rdy table (that is to say that at one point os_rdy.p_lnk = 0, and then the kernel did this: os_rdy.p_lnk = os_rdy.p_link->p_lnk).
I have found forum posts of other people having this problem -- unfortunately no solutions were offered and I am unable to reply to their threads: http://www.keil.com/forum/docs/thread12032.asp http://www.keil.com/forum/docs/thread12671.asp http://www.keil.com/forum/docs/thread7618.asp
This condition occurs very rarely -- on the order of once every 24 hours. It always seems to occur shortly after an interrupt that makes use of the isr_mbx_send and isr_evt_set functions -- but this might be coincidence.
I am running RV MDK V3.70 and RL-ARM V3.70. My MCU is the LPC2468.
Any advice would be greatly appreciated!
Thanks, Eric
Well, yes and no. I don't know what the direct cause of the problem is, but I did manage to make it go away by doing one of two things.
Either 1: removing my isr os calls from interrupts (namely isr_event_set and isr_sem_send). Are you using either of these functions?
OR, 2: consolidating some of our tasks so that we only had eight running. How many tasks do you have running?
Either of these fixes completely removed the problem for us. Strangely enough, setting all of our vectored interrupts to the same priority made the problem happen much more frequently.
I should note that it is possible our code was corrupting the OS. However, all of the os structures looked uncorrupted at the instant of the problem. Keil has not had any luck tracking this down themselves. But I recommend calling in and starting a case number. You can have them reference my case: 438460.
Please keep me posted as you debug this. I am very interested in finding the cause of this problem.
-Eric
yes, i am using isr_event_set and other isr_xxx functions often. completely removing them will take some time but i will try it.
i have something like 10-15 tasks depending on user's actions.
i will post if i can verify that something is a cause of this
You might try the latest RL-ARM and RVMDK (3.80). I see in the release notes that they fixed some problems with isr_xxx functions.
Unfortunately, version 3.80 did not fix our problems.
What microcontroller are you using?
Eric,
I have a situation in which the ready list contains entries that point to themselves! very frustrating and problematic for the product/client. I am running out of patience and what is infinitely worse - time...
see here:
http://www.keil.com/forum/docs/thread15337.asp
I really don't know if this is the result of data corruption. it is just too slick - always the same. the problem is easy to reproduce on the product when the tick rate is 50 micro, but I have written a separate test program that does not have it...! so - it is either timing related, so data corruption, etc....
i have removed all isr_xxx functions but the problem wasn't solved so i put them back in my code.
i was using a 1ms tick time. it would normally crash within 1-5 days. i reduced tick time to 50us and it crashes within 1-10 seconds. always at the same place in os_get_first.
i don't have a support thread going yet because our support expired this month. dunno if we are going to renew it yet.
This sounds a lot like what Tamir is finding.
Is there anyway that you can get this to happen in the simulator and share a sample project? It would be really interesting to have Keil take a look at this.
Eric, Ryan,
I found that a tick rate of 10 milliseconds yields a much more stable system. I did not see any crashes so far with this tick rate, but nevertheless this issue must be fixed and it will - Keil are busy with it right now as far as I know.
What chips (and their revisions) are you using? I thought you were using an LPC2468, Eric? but what hardware revision?
Please read:
http://www.keil.com/forum/docs/thread15346.asp
yea, i'm using lpc2468. an older version.
i had set my clock as the errata says, but didn't see the mam setting. my startup file was modified from some example that came with my EA lpc2468 dev board.
so, now i've set mam to 1 instead of 2 and its not crashing with the short tick. i'll have to let it run for days now and see for sure.
i was wrong about this. it is still crashing, just less often. about the same effect as using a larger tick time.
this is a very worrying report, Ryan. Have you tried my test program, that can be found here http://www.keil.com/forum/docs/thread15346.asp ? Does it still crash on your hardware? I'm using a LPC2478 and after re-configuring the PLL errata details of the LPC2468 - I have not observed any crashes (maybe I didn't run long enough).
I am using an LPC2468 Revision B. My clock rates are well below the maximums listed in the errata and I am not using the MAM interface.
Franc --
I left you a response here regarding why I think this may not be a hardware issue: http://www.keil.com/forum/docs/thread15346.asp