Hello,
I have a major problem with RTX and Keil don't seem to be able to help (as they want a simple scenario to cause the problem, but I cannot give them the hardware of course. Maybe I can make it go wrong using an evaluation board). I'm using RTX as the backbone of a product that needs to run for extended periods of time without reboot (weeks...). The problem is that RTX stops executing arbitrary tasks at arbitrary moments - they remain 'ready' but not get services. Today I discovered a task entering 'WAIT_MUT' while not using ANY mutex. My question: Are there any tips using RTX correctly? I am growing totally frustrated and tired of this, what am I supposed to tell the client?! I'm using latest and so expensive RL-ARM without any results whatsoever. Can you share your experience with me?
Thanks you for your attention,
Tamir
Per,
The point you made about the DMA transfers is indeed an issue. I never meant this to be something more a possible little help in case things are that much out of control (believe me, they were until a couple of days ago - nervous clients, nervous boss, nervous keyboard...). I don't think Keil are going to do this with RTX (there are other, more pressing issues...) - let's leave it as an intellectual exercise.
I regularly look at checksumming as one of the available tools to detect problems, but prefer to use it in situations where it can be included in the release build. Just as previously mentioned, it is best to test the same build that is expected to ship. It is enough to change a single byte in RAM or flash to make the debug build pass all tests (even if buggy) while the release build will fail - possibly in a routine the customer will only trig once every three months.
The reason I posted was that Hans-Bernhard Broekers post was aimed at pointing out that checksumming can't validate something as correct. But that is a separate issue from using it as a tool to detect something broken. A bigger issue with checksumming (at least when used in release builds) is to decide what action to perform in case of a checksum error. Auto-repair, reboot, deadlock, warn, ...
I read this topic with a lot of interest as it reminds me the lack of debug support RTX is providing .
Statistical data about % tasks execution times, state of a mutex , number of free memory blocks , number of free semaphores, etc. could be a VERY interesting improvement for the RTX library !!!
I am using RTX also. I was running a test over the weekend and came in to find the system had died for no apparent reason. I ran it overnight again and again and sometimes it would be running and other times it had died (basically running just a single task and ISR in this case). To make a long story short it was the ULINK debugger I left connected to the board. The damn thing cannot remain connected to the PC via USB when running overnight tests, even though it was not being used and my PC was off. The USB was unplugged from the ULINK and the 'problem' disappeared. (BTW: ULINK2 must be completely removed from the board).
hmm, I don't think this phenomenon has anything to do with RTX itself - we do keep a ULINK2 connected without a problem. are you absolutely SURE that problem has disappeared? some system's here ran for a week without a problem, others died after 1 or 2 days.
Well, the only test I was running was with a SSP as SPI Master interfacing to a single slave and exchanging an identical command/request sequence using Modbus ASCII protocol. No changes were made to the code when I removed the USB connection from the ULINK. It hasnt died since I did this. The frames exhanged were upwards of 2.7 million after a successful weekend run. Before the ULINK was removed I was consistently failing at a fraction of the frames reported. I cannot say that this is your problem but it is another angle you need to consider...
As I noted above, my issue is resolved already. It was all about addressing FlashFS's SD card driven card from exception mode, as well as a path that locked a mutex from exception mode. Once removed, no hangup were experienced anymore.
Hi all,
I am using the At91sam7s128 with MDK 3.20 and ULINK2 and having this similar problem.
My RTX clock interval is set to 1 ms and UART task has higher priority (2) than other tasks.
I have 64 boards running exactly same firmware. A PC is always polling these 64 boards for the data. Randomly, one or two boards will stop running after a random time (from hours to days). However some boards are running for weeks without any problems.
I caught this problem once with debugger and found it stops in the os_idle_demon and will not switch to other tasks that are ready for service.
Regards,
Xiao
See here:
http://www.keil.com/forum/docs/thread15346.asp
Thank you for the link. It give me a lot of information.
Is there any workaround for this problem right now? Do we have to pay extra couple of grands for the new release?
Regards, Xiao
Keil are working on a new release of RL-ARM. If you have a license, it's for free. I do have a prototype fix which seems to work fine, but I do not think Keil will appreciate me distributing it right now. Wait a little longer for an official release.
Hi,
I got confused about this license. I do have a license. The free upgrade is only good for one year from the purchase, am I right?
I've been using this software for more than one year.
this is a question for Keil support, I am afraid.