Hello,
I am porting the RTX Kernel to a Cortex-M3 device and ran into a difficulty.
I have set up 2 tasks to toggle 2 LEDs to see if my tasks are running as expected. As below.
/*---------------------------------------------------------------------------- * Task 4 'blink_P2': Blink LED P2 *---------------------------------------------------------------------------*/ __task void blink_P2 (void) { os_itv_set (40); for (;;) { os_itv_wait (); Toggle_P2(); } } /*---------------------------------------------------------------------------- * Task 5 'blink_P3': Blink LED P3 *---------------------------------------------------------------------------*/ __task void blink_P3 (void) { os_itv_set (40); for (;;) { os_itv_wait (); Toggle_P3(); } }
If the time delay is set the same for both tasks then there is no problem. Both tasks toggle each LED at 40mS. This works.
However if I change the time delay on one task,(for example the second task to 50mS) then both tasks now take several seconds to toggle the LEDs.
I have ported the RTX kernel previously to an ARM7 core without difficulty but cannot see the problem on the Cortex-M3 ?
Can someone advise please ?
thanks!
Are any other tasks running? What are the ask priorities? Can you show us the task create section of your code?
Note: the wait functions are in values of systick not miliseconds.
Did you properly configure your systick timer?
Note: to post code you can use < pre> and </ pre> (without the spaces).
M
Thanks for the reply Marc,
Yes only 2 tasks are setup. The init task sets up the 2 tasks as below and exits.
/*---------------------------------------------------------------------------- * Task 4 'init': Initialize *---------------------------------------------------------------------------*/ __task void init (void) { GPIO_INIT(); t_blink_P2 = os_tsk_create (blink_P2, 0); /* start task 'blink' */ t_blink_P3 = os_tsk_create (blink_P3, 1); /* start task 'blink' */ os_tsk_delete_self (); }
In the RTX_Conf_CM.c file it is as default except for :
// </h> // <h>SysTick Timer Configuration // ============================= // <o>Timer clock value [Hz] <1-1000000000> // Set the timer clock value for selected timer. // Default: 6000000 (6MHz) #ifndef OS_CLOCK #define OS_CLOCK 16000000 #endif // <o>Timer tick value [us] <1-1000000> // Set the timer tick value for selected timer. // Default: 10000 (10ms) #ifndef OS_TICK #define OS_TICK 1000 #endif
thanks Mike
Hi Robert McNamara,
I use 1ms timer ticks all the time. I find it more feels more "normal" and "natural" than 10ms ticks.
I think this is for a Cortex-M3 MCU. How about an ARM7 MCU? And how fast the CCLK is? I mean, could you please teach me how to decide a proper timer tick?
I heard that, the current/recent Linux kernel is a tickless kernel, but RTX only checks the task status when a tick comes, am I right?
When Task1 executes the os_tsk_pass(), will Task1 passes control to the next task of the same priority immediately? or it needs to wait for a tick?
When Task1 executes the os_tsk_pass(), and there is no task of the same priority in the ready queue, when will Task1 passes control to the other tasks?
Sorry for asking questions in this thread, I am no longer a frequent visitor of this KEIL forum, but I remember that, it is not easy to encounter Robert McNamara here.
Note that ticks are used by operating systems - generally, and not just RTOS - to switch between tasks of same priority that are in the "runnable" state.
When an event happens, that makes a less prioritized task runnable, nothing happens. When an event happens, that makes a more prioritized task runnable, the higher prio is a reason for the OS to switch task.
When a high-priority task doesn't pause itself with a wait call, it doesn't matter what less prioritized tasks wants. The exception is if the OS has support for priority inversion - noticing that a high-prio task is locked by a lock owned by a low-prio task. Then it can give the low-prio task a high priority temporarily just to let it get CPU time to release the lock and hence allow the real high-prio task to get the resource and run.
When a task "passes", it just says: I don't have anything important to do right now. Maybe someone else wants to step in for a while. But the task can't "pass" if a higher prio task is already runnable - the lower-prio task wouldn't be running so it would not be able to call any functions.
And if the only runnable task is a lower prioritized task, then a "pass" doesn't mean much either - the low-prio task still has too little prio to do anything. "pass" doesn't mean to give up the "runnable" state. It just tries to give up the "running" state in case another task of same priority is waiting in "runnable".
When all tasks have different priority, then time slice lengths aren't really important. You don't get a time-sliced system but instead an event-driven system. Tasks are selected when they have high enough prio and wants to run. All other tasks are put to sleep. And the only reason a task runs is that it has highest prio, or all higher prioritized tasks are stuck waiting for something - i.e. not runnable.
But event-driven systems only handles tasks of different priority. When tasks have same priority, you no longer have a mathematical formula to tell which thread must run. So an OS can then support time slicing, where runnable tasks of same priority get some CPU time one by one in sequence based on the time slice settings - what granularity the OS scheduler is using.
The normal way to handle a task that passes the end of a slice to another task (of same prio) is to have a clock that runs at a higher speed than the interval between time-sliced task switches. With a clock that runs at same frequency as the time slicing, a task that gives up the slice after 50% of the time will result in the next task only receiving a half slice before the swap timer ticks. Having the swap timer running at a 10 times higher speed means that the first thread gets 10 ticks. If it gives up the CPU after 50%, the next task need not just get 5 ticks (the remaining 50%) but can get a full set of 10 ticks.
With one tick/switch, a two-task program where one task always gives up the time after 90% could result in two identically prioritized tasks receiving 90% and 10% of the CPU capacity.
What timing methods RTX uses for different processor cores is something Keil has to answer. Exactly how they handle tasks that passes up their time slice early - what amount of time the next task will get. If it is "the remaining time" (tm), then really bad things can happen.
Hi Per,
Many thanks for your help. Now I think I understand the os_tsk_pass() better.
"pass" doesn't mean to give up the "runnable" state. It just tries to give up the "running" state in case another task of same priority is waiting in "runnable".
Hi all,
The documentation about the RTX scheduler isn't bad and addresses many of these topics:
http://www.keil.com/support/man/docs/rlarm/rlarm_ar_schedopt.htm
As for the systick, I normally use 10ms because as far as I understand this is the default time for Cortex-M3 Systick interval. infocenter.arm.com/.../index.jsp
If you consider the CPU clk I'm sure you could come up with a logical sytick interval.
For example, say 50MHz CPU clock, this would mean 1ms * 50MHz = 50000 clocks per milisecond.
(I think)Most Cortex-M instructions are 1-4 clocks this makes for ~25000 instructions per tick. Depending on what your tasks do this could work fine or might not be enough.
Long story short the Systick interval is probably best determined on a case-by-case basis.
Hi Marc,
Thanks for your help.
So, what you suggest is: A. Normally 10ms or B. Cortex-M instructions averagely take 2 clocks, so use the lst file of Task_X to calculate how many instructions the Task_X contains, then get the number of clocks that Task_X may need.
Is my above understanding correct?
(I am not a native English speaker, sorry for my bad English writing.)
Hi John,
I'm not sure you can make the decision so methodical but I think you understand what I was saying.
10ms is the hardware default for the Cortex-M3 systick timer. This does not necessarily mean it is the best choice.
For example, if you have a task that executes on an event and you don't want that execution to be interrupted you better make sure the time slice of the RTOS is long enough for a complete loop execution. (Otherwise you may have to use locks... which can add more complexity)
Basically my opinion is that the OS time slice should generally be long enough for a complete cycle of a task in it's longest case. (Note the time slice is a number multiplied by the systick. eg. 5*10ms = 50ms)
Of course this is very subjective and I can imagine cases where this doesn't matter or wouldn't apply.
Understanding your system's timing is the most important thing because you will be better able to anticipate issues and debug issues when they are encountered.
It's important to separate real-time critical work from background work.
Real-time critical work should be able to finish within the slice period.
Background work can normally span any number of slices - it's even likely that background work will not even be able to use their slices because they get interrupted by "real" work.
About amount of time needed to perform a task - it's way easier to just measure the computation time needed than to try to count cycles. Especially since you may also have a background noise of interrupts running. So measuring the maximum time consumed at maximum interrupt load and then adding a bit of a safety margin would be better than just having taken the sum of all instructions.
Per is absolutely correct.
The only thing I would point out in addition is that because (at least in my experience) it can be difficult to force a tasks (a critical "real work" task) into it's "longest path" taking a look at the number of instructions can be of additional value.
Also as Per pointed out you can eliminate this issue by designing non-interruptable operations in tasks with high priorities.
Thanks for all the helps.
Sorry for that, I think I can not catch the key point.
Assuming that,
There are 4 tasks, task_A, task_B, task_C, task_D.
task_A for critical CAN receiving/processing/transmission, highest priority; needs 25ms. task_B for critical UART receiving/processing, high priority; needs 18ms. task_C for key input checking, low priority; needs 1ms. task_D for LCD displaying, lowest priority; needs 10ms.
If I set the Round-Robin time slice to 15ms. When there are no CAN and UART events, task_A and task_B are at wait_for_some_ticks() or wait_for_HW_event(). So task_A and task_B are not runnable. System is working on task_C and task_D.
When CAN event happens, soon or later, task_A wakes up from waiting_state, since task_A has the highest priority, task_A performs as long as it likes. The Round-Robin time slice does not matter.?
I know I must miss some very important pieces, but I just can not discover it.
I am no longer a KEIL/ARM user, but would like to know more about RTOS/scheduling.
I think you are correct. With 4 tasks all with different priorities the round-robin time slice doesn't really matter as you are designing a preemption based system.
task_A for critical CAN receiving/processing/transmission, high priority; needs 25ms. task_B for critical UART receiving/processing, high priority; needs 18ms. task_C for key input checking, low priority; needs 1ms. task_D for LCD displaying, low priority; needs 10ms.
If I set the Round-Robin time slice to 19ms, and the events happen in the following order, UART -1ms-> 1st-CAN -XXms-> UART -YYms-> 2nd-CAN, task_A will be blocked for (18-1)ms, then execute only 19ms, not able to finish its 1st-CAN, then be blocked for 18ms once again, when 2nd-CAN comes, there are two CAN events need to be handle.
But if I set the Round-Robin time slice to 26ms, it would be much better.
Note that in many situations, you can split a task into a real-time critical part and a less critical background task.
So the question here is that CAN thread really do need 25 ms, or if it can be optimized into a high-prio part that makes real-time-critical decisions, and a background task that may figure out new questions to ask or what to do with logs/statistics of received data.
Another thing is that the program would normally use interrupts to pick up received data and store in an input buffer, to make sure that the hw buffers do not overflow.
Both the 18ms for the UART and 25ms for the CAN tasks sounds like quite long times, indicating a hope that some of work can be splitted out from the real-time tasks.
Another thing here is that you just have to take into account the worst-case loss of CPU capacity when your interrupt handlers are running at peak, thereby stealing CPU cycles from your time slices. Will your program consume 1%, 10% or more of CPU capacity just for interrupt processing at peak load?
Now, my understanding is:
1. Pre-emptive scheduling 2. Round-Robin scheduling 3. Round-Robin Pre-emptive scheduling 4. Co-operative multi-tasking are quite different.
And the order of events, the interval of events are also important.
I checked the below pages, www.keil.com/.../rlarm_ar_technical_data.htm www.keil.com/.../rlarm_ar_timing_spec.htm
They talk about things like Interrupt lockout time/Interrupt latency, I assume that, this is for, when RTOS is doing something that needs to be atomic, RTOS quickly/temporarily disable/enable interrupts, so interrupts are locked out, we get some interrupt latency. Am I right?
The cycles of [Set event (switch task)] means the guaranteed response time of an event processing. Am I right?
2. Round-Robin scheduling 3. Round-Robin Pre-emptive scheduling 4. Co-operative multi-tasking
You normally only have preemptive (timer-based) and cooperative round-robin scheduling.
Cooperative is when the task calls a specific "pass" function, or call any of the wait functions. Sometimes, other OS functions may also generate a task switch in case the OS thinks enough time has passed.
Old 16-bit Windows did use cooperative scheduling which was the reason everything locked up if one program did something really time-consuming without calling one of the magic functions where Windows would be able to perform a task switch.
Cooperative scheduling is easier to implement and also have the advantage that you don't get into troubles by task switches happening at random locations in the code. So unless you interact with an interrupt routine, you don't need critical sections.
With preemptive scheduling, and when sharing variables between a task and an ISR, you almost always needs critical sections to avoid problems with concurrent updates. And these critical sections does affect your latencies. Both in relations to interrupts and in relation to switching to a higher-prioritized task after it has become runnable.
So the documentation for the RTOS can give worst-case information about the OS primitives. But depending on use of critical sections or similar, the application itself can add significant to the worst-case performance.
Many thanks for your help.
With RTX I believe the most common scheduling option is pre-emptive round robin.
i.e. dedicated time slices for tasks with the same priority but the ability of high priority tasks to pre-empt lower priority tasks.
But note that preemptive isn't just the ability of high priority tasks to pre-empty lower priority tasks. Preemptive works even when all threads have the same priority. It's just that the scheduler can switch threads asynchronously anywhere. This compared to co-operative task switching where the scheduler can only perform a task switch when a task have called an OS function that explicitly supports task switching.
So RTX would use preemptive scheduling (break at any position in thread code, except inside critical sections). And RTX would use priority, i.e. the tasks are switched in a fixed sequence (round robin) when they have the same priority, but switched depending on arrival of events if the threads have different priorities.
I think this may be a semantics thing but I believe in the context of RTX terminology preemption when the tasks are of the same priority is referred to as "round-robin". While the term "preemption" is used (I believe) exclusively for priority preemption.
Correct me if I'm wrong but this is how I've always understood their docs. (If I remember correctly this is not the terminology used by freertos)
Keil has always used a bit of creative freedom when naming functions.
That can, alas, create lots of confusion when there exists established naming conventions.
velOSity RTOS www.ghs.com/.../velosity.html
Unlike other real-time operating systems that disable interrupts in every kernel service call, velOSity's state-of-the-art architecture guarantees the absolute minimum interrupt latency by never disabling interrupts in any service call.
==========>
I am quite confused about that "never disabling interrupts". Does that mean, it does not disable interrupts in any critical sections?
www.padauk.com.tw/products.php
All series of PADAUK Proudcts are developed by FPPA technology. FPPA (Field Programmable Processor Array) is the technology to realize the truly parallel processing, multi-cores concept in one die. In FPPA architecture, all the processing cores can run its own program independently. Moreover, due to its supreme architecture, the power consumption will keep low.
One of the English Datasheets: www.padauk.com.tw/.../getfile.php
It seems that, some people prefer to use hardware power to solve the parallel processing problem. So they build an 8-core MCU (Field Programmable Processor Array), with such FPPA MCU, there is no latency for multi-tasking.
I don't think it means that. I think in kernel calls (eg. SVC calls) they have designed the OS to be re-entrant. (i.e. you can interrupt the kernel function with a hardware interrupt.
Just my guess, I know nothing about this OS.
KEIL RTX does the same. hardware interrupts are never disabled by the system
but you can disable them by your own whereever you like
View all questions in Keil forum