CMSIS++, or rather POSIX++, is a POSIX-like, portable, vendor-independent, hardware abstraction layer intended for C++/C embedded applications, designed with special consideration for the industry standard ARM Cortex-M processor series. Originally intended as a proposal for the next generation CMSIS, CMSIS++ can probably be more accurately defined as "C++ CMSIS", and POSIX++ as "C++ POSIX".
The CMSIS++ cornerstone is the RTOS, and in this respect CMSIS++ RTOS can be analysed from two perspectives: the CMSIS++ RTOS APIs, with a modern design and the CMSIS++ RTOS reference implementation with a clean and efficient code.
In the first phase of the project, the CMSIS++ RTOS APIs were designed, with POSIX threads in mind, but from a C++ point of view.
The native CMSIS++ RTOS interface is the C++ API, with a C API implemented as a wrapper, and an ISO C++ Threads API implemented also on top of the native C++ API.
Initially, the C++ API was validated by implementing it as a wrapper on top of the popular open source project FreeRTOS. Full functionality was achieved, and the entire system passed the ARM CMSIS RTOS validation suite.
With the native C++ API validated, while still using the safety net provided by an existing scheduler, the next step toward a grand design was to implement, in a portable way, the synchronisation objects defined by the CMSIS++ RTOS.
The result was a highly portable implementation, that requires a very simple interaction with the scheduler, basically a thread suspend() and resume().
Using this model, all RTOS objects were implemented (semaphores, mutexes, condition variables, message queues, memory pools, event flags, clocks and timers); full functionality was achieved, and again the entire system passed the ARM CMSIS RTOS validation suite.
To be noted that in this configuration, when running on top of an existing RTOS, it is perfectly possible to select which implementation to use, at individual object level; in other words it is perfectly possible to run with some objects implemented by the host RTOS and some objects using the reference portable implementation. This is generally useful when some of the objects defined by CMSIS++ are not available in the host RTOS; for example in the current version of FreeRTOS there were no memory pools or condition variables, and these objects were supplied by the reference implementation.
The last piece to complete the puzzle was the scheduler. The CMSIS++ RTOS specifications do not mandate for a specific scheduling policy, and, when running on top of an existing RTOS, any scheduling policy can be used.
However, the CMSIS++ RTOS reference scheduler takes the beaten path and implements a priority based, round robin, cooperative and optionally preemptive scheduler.
In other words, threads are assigned priorities, higher priority threads are scheduled first, equal priority threads are scheduled in a round robin way, and scheduling points are entered either explicitly at any wait() or yield(), or are optionally triggered by periodic interrupts, like the system clock ticks, or by user interrupts.
The scheduler was designed to be as portable as possible, and to run on any reasonable architecture, with any word size.
As such, the scheduler's main responsibility is to manage the list of threads ready for execution and to switch their execution contexts in an orderly manner.
Although not mandatory for its functionality, the scheduler also keeps track of all registered threads, and provides iterators to walk these lists.
For a better modularity, the scheduler itself does not keep track of threads waiting for various events; this is delegated to the various synchronisation objects, that are expected to implement their own policy of suspending and resuming execution of threads waiting for common resources.
However, the reference synchronisation objects use similar lists to keep track of the waiting threads, and, to simplify the implementation, the scheduler provides base classes for these lists.
Regardless how carefully a portable scheduler is designed and implemented, there will always be a last mile where the platform differences become important.
To accommodate for these differences, the scheduler needs to be ported on a specific platform. The port includes the specific definitions, mainly the way of creating and switching thread contexts, but also handling interrupts, accessing timers and clocks, etc.
There are currently two such CMSIS++ RTOS scheduler ports available and fully functional:
These ports are actually not part of the CMSIS++ package itself, which is highly portable, but are part of separate µOS++ packages.
This 32-bits ARM thumb port is specifically designed to run on Cortex-M devices. It currently supports ARMv6-M and ARMv7-M architectures, with or without FPU. Support for ARMv8-M will be added when needed.
The implementation uses the ARM specific features, like PendSV, which greatly simplify things.
For example, the context switching is performed by a rather simple function:
void __attribute__ ((section(".after_vectors"), naked, used, optimize("s"))) PendSV_Handler (void) { // The naked attribute and the push/pop are used to fully control // the function entry/exit code; be sure other registers are not // used in the assembly parts. asm volatile ("push {lr}"); // The whole mystery of context switching, in one sentence. :-) port::scheduler::restore_from_stack ( port::scheduler::switch_stacks ( port::scheduler::save_on_stack ())); asm volatile ("pop {pc}"); }
Apart from saving/returning, this function does exactly what it is expected to do:
The two save/restore functions are among the very few in the Cortex-M port that require assembly code:
inline stack::element_t* __attribute__((always_inline)) save_on_stack (void) { register stack::element_t* sp_; asm volatile ( // Get the thread stack " mrs %[r], PSP \n" " isb \n" #if defined (__VFP_FP__) && !defined (__SOFTFP__) // Is the thread using the FPU context? " tst lr, #0x10 \n" " it eq \n" // If so, push high vfp registers. " vstmdbeq %[r]!, {s16-s31} \n" // Save the core registers r4-r11,r14. // Also save EXC_RETURN to be able to test // again this condition in the restore sequence. " stmdb %[r]!, {r4-r9,sl,fp,lr} \n" #else // Save the core registers r4-r11. " stmdb %[r]!, {r4-r9,sl,fp} \n" #endif : [r] "=r" (sp_) /* out */ : /* in */ : /* clobber. DO NOT add anything here! */ ); return sp_; } inline void __attribute__((always_inline)) restore_from_stack (stack::element_t* sp) { // Without enforcing optimisations, an intermediate variable // would be needed to avoid using R4, which collides with // the R4 in the list of ldmia. // register stack::element_t* sp_ asm ("r0") = sp; asm volatile ( #if defined (__VFP_FP__) && !defined (__SOFTFP__) // Pop the core registers r4-r11,r14. // R14 contains the EXC_RETURN value // and is restored for the next test. " ldmia %[r]!, {r4-r9,sl,fp,lr} \n" // Is the thread using the FPU context? " tst lr, #0x10 \n" " it eq \n" // If so, pop the high vfp registers too. " vldmiaeq %[r]!, {s16-s31} \n" #else // Pop the core registers r4-r11. " ldmia %[r]!, {r4-r9,sl,fp} \n" #endif // Restore the thread stack register. " msr PSP, %[r] \n" " isb \n" : /* out */ : [r] "r" (sp) /* in */ : /* clobber. DO NOT add anything here! */ ); }
The generated code (for Cortex-M3) is remarkably neat and tidy:
08000198 <PendSV_Handler>: 8000198: b500 push {lr} 800019a: f3ef 8009 mrs r0, PSP 800019e: f3bf 8f6f isb sy 80001a2: e920 0ff0 stmdb r0!, {r4, r5, r6, r7, r8, r9, sl, fp} 80001a6: f000 fe07 bl 8000db8 <os::rtos::port::scheduler::switch_stacks(unsigned long*)> 80001aa: e8b0 0ff0 ldmia.w r0!, {r4, r5, r6, r7, r8, r9, sl, fp} 80001ae: f380 8809 msr PSP, r0 80001b2: f3bf 8f6f isb sy 80001b6: bd00 pop {pc}
One of the initial CMSIS++ RTOS design requirements was to give the user full control over the memory allocation.
The implementation fulfilled this requirement, allowing any possible memory allocation scheme, from the simplicity of using fully static allocation to the extreme of using separate custom allocators for each object requiring dynamic memory.
The objects requiring dynamic memory are:
All these objects have a last allocator parameter in their constructors that defaults to the system allocator memory::allocator<T>.
For example one of the thread constructors is:
using Allocator = memory::allocator<stack::allocation_element_t>; thread (const char* name, func_t function, func_args_t args, const attributes& attr = initializer, const Allocator& allocator = Allocator ());
By default the memory::allocator<T> is defined as:
template<typename T> using allocator = new_delete_allocator<T>;
but the user can define it as any standard C++ allocator, and so the behaviour of all objects requiring dynamic memory can be customised at once.
Even more, each such object has a separate template version, that takes a last allocator parameter, so at the limit each such object can be allocated using a separate allocator.
Given the magic of C++, using such allocators is straightforward:
template<typename T> class my_allocator; thread_allocated<my_allocator> thread { "th", func, nullptr }; message_queue_allocated<my_allocator> queue1 { "q1", 7, sizeof(msg_t) }; message_queue_typed<msg_t, my_allocator> queue2 { "q2", 7 }; memory_pool_allocated<my_allocator> pool1 { "p1", 7, sizeof(blk_t) }; memory_pool_typed<blk_t, my_allocator> pool2 { "p2", 7 };
Static allocation is handled using exactly the same method, but different templates:
thread_static<2500> thread { "th", func, nullptr }; message_queue_static<7, msg_t> queue { "q" }; memory_pool_static<7, blk_t> pool { "p" };
Writing RTOS unit tests was always tricky and the results debatable. This does not mean it should not be attempted; actually, if done properly, these tests can be very useful.
To improve testability, the synthetic POSIX platform was implemented. It allows to run most RTOS tests within a very convenient environment like macOS or GNU/Linux.
Another greatly helpful tool used to run the RTOS tests is the GNU ARM Eclipse QEMU, which emulates the STM32F4DISCOVERY board well enough for most tests to be relevant.
Actually most of the times the tests were performed either on macOS or on QEMU, and only rarely, usually at the end, as a final validation, the tests were also performed on physical hardware.
The main test was the ARM CMSIS RTOS validation suite, that exercises quite thoroughly the interface published in the cmsis_os.h file.
This test is automatically performed by the test scripts on the STM32F4DISCOVERY board running under GNU ARM Eclipse QEMU and on the synthetic POSIX platform.
The result of a run is:
CMSIS-RTOS Test Suite Jun 23 2016 16:03:42 TEST 01: TC_ThreadCreate PASSED TEST 02: TC_ThreadMultiInstance PASSED TEST 03: TC_ThreadTerminate PASSED TEST 04: TC_ThreadRestart PASSED TEST 05: TC_ThreadGetId PASSED TEST 06: TC_ThreadPriority PASSED TEST 07: TC_ThreadPriorityExec PASSED TEST 08: TC_ThreadChainedCreate PASSED TEST 09: TC_ThreadYield PASSED TEST 10: TC_ThreadParam PASSED TEST 11: TC_ThreadInterrupts PASSED TEST 12: TC_GenWaitBasic PASSED TEST 13: TC_GenWaitInterrupts PASSED TEST 14: TC_TimerOneShot PASSED TEST 15: TC_TimerPeriodic PASSED TEST 16: TC_TimerParam PASSED TEST 17: TC_TimerInterrupts PASSED TEST 18: TC_SignalMainThread PASSED TEST 19: TC_SignalChildThread PASSED TEST 20: TC_SignalChildToParent PASSED TEST 21: TC_SignalChildToChild PASSED TEST 22: TC_SignalWaitTimeout PASSED TEST 23: TC_SignalParam PASSED TEST 24: TC_SignalInterrupts PASSED TEST 25: TC_SemaphoreCreateAndDelete PASSED TEST 26: TC_SemaphoreObtainCounting PASSED TEST 27: TC_SemaphoreObtainBinary PASSED TEST 28: TC_SemaphoreWaitForBinary PASSED TEST 29: TC_SemaphoreWaitForCounting PASSED TEST 30: TC_SemaphoreZeroCount PASSED TEST 31: TC_SemaphoreWaitTimeout PASSED TEST 32: TC_SemParam PASSED TEST 33: TC_SemInterrupts PASSED TEST 34: TC_MutexBasic PASSED TEST 35: TC_MutexTimeout PASSED TEST 36: TC_MutexNestedAcquire PASSED TEST 37: TC_MutexPriorityInversion PASSED TEST 38: TC_MutexOwnership PASSED TEST 39: TC_MutexParam PASSED TEST 40: TC_MutexInterrupts PASSED TEST 41: TC_MemPoolAllocAndFree PASSED TEST 42: TC_MemPoolAllocAndFreeComb PASSED TEST 43: TC_MemPoolZeroInit PASSED TEST 44: TC_MemPoolParam PASSED TEST 45: TC_MemPoolInterrupts PASSED TEST 46: TC_MsgQBasic PASSED TEST 47: TC_MsgQWait PASSED TEST 48: TC_MsgQParam PASSED TEST 49: TC_MsgQInterrupts PASSED TEST 50: TC_MsgFromThreadToISR PASSED TEST 51: TC_MsgFromISRToThread PASSED TEST 52: TC_MailAlloc PASSED TEST 53: TC_MailCAlloc PASSED TEST 54: TC_MailToThread PASSED TEST 55: TC_MailFromThread PASSED TEST 56: TC_MailTimeout PASSED TEST 57: TC_MailParam PASSED TEST 58: TC_MailInterrupts PASSED TEST 59: TC_MailFromThreadToISR PASSED TEST 60: TC_MailFromISRToThread PASSED Test Summary: 60 Tests, 60 Executed, 60 Passed, 0 Failed, 0 Warnings. Test Result: PASSED
This test exercises the scheduler and the thread synchronisation primitives. It creates 10 threads that compete for a mutex, simulate random activities and compute statistics on how many times each thread acquired the mutex, to validate the fairness of the scheduler.
The test is automatically performed by the scripts on the STM32F4DISCOVERY board running under GNU ARM Eclipse QEMU and on the synthetic POSIX platform.
A typical result of the run is:
Mutex stress & uniformity test. Built with GCC 5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589]. Seed 3761791254 [ 5s] t0:39 t1:42 t2:37 t3:41 t4:38 t5:37 t6:36 t7:41 t8:40 t9:34 sum=385, avg=39, delta in [-5,3] [-12%,8%] [ 10s] t0:74 t1:82 t2:79 t3:84 t4:79 t5:84 t6:77 t7:76 t8:80 t9:75 sum=790, avg=79, delta in [-5,5] [-5%,6%] [ 15s] t0:114 t1:120 t2:116 t3:128 t4:117 t5:122 t6:114 t7:116 t8:116 t9:115 sum=1178, avg=118, delta in [-4,10] [-2%,8%] [ 20s] t0:155 t1:161 t2:152 t3:163 t4:153 t5:160 t6:154 t7:159 t8:154 t9:154 sum=1565, avg=157, delta in [-5,6] [-2%,4%] [ 25s] t0:196 t1:199 t2:194 t3:206 t4:193 t5:198 t6:194 t7:200 t8:197 t9:194 sum=1971, avg=197, delta in [-4,9] [-1%,5%] [ 30s] t0:233 t1:236 t2:241 t3:245 t4:231 t5:236 t6:233 t7:237 t8:234 t9:237 sum=2363, avg=236, delta in [-5,9] [-1%,4%] [ 35s] t0:270 t1:281 t2:277 t3:284 t4:266 t5:273 t6:279 t7:278 t8:273 t9:277 sum=2758, avg=276, delta in [-10,8] [-3%,3%] Done.
This test exercises the synchronisation primitives available from interrupt service routines and the effectiveness of the critical sections. It creates a high frequency hardware timer which posts to a semaphore, and a thread counts if the posts arrived in time or were late, in other words if the scheduler was or not able to wakeup the thread fast enough.
The test runs on the physical STM32F4DISCOVERY board.
A typical result of the run shows that on this platform the scheduler can stand about 250.000 context switches per second:
Semaphore stress test. Built with GCC 5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589]. Iteration 0 Seed 832262406 42000 cy 1 kHz 21000 cy 2 kHz 10500 cy 4 kHz 5250 cy 8 kHz 2625 cy 16 kHz 1312 cy 32 kHz 656 cy 64 kHz 328 cy 128 kHz 164 cy 256 kHz 1 late 82 cy 512 kHz 777 late 41 cy 1024 kHz 998 late 20 cy 2100 kHz 999 late 10 cy 4200 kHz 999 late
CMSIS++ is still a young project, and many things need to be addressed, but the core component, the RTOS, is pretty well defined and functional.
For now it may not be perfect (as it tries to be), but it definitely provides a more standard set of primitives, closer to POSIX, and a wider set of APIs than many other existing RTOSes, covering both C++ and C applications; at the same time it does its best to preserve compatibility with the original ARM CMSIS APIs.
Any contributions to improve CMSIS++ will be highly appreciated.
CMSIS++ is an open source project, maintained by Liviu Ionescu. The project is released under the terms of the MIT license.
The main source of information for CMSIS++ is the project web.
The Git repositories and all public releases are available from GitHub; specifically the stress tests are available from the tests folder.
The code for ARM CMSIS RTOS validator is available from GitHub.
The code for the Cortex-M scheduler port is available from GitHub.
The code for the synthetic POSIX scheduler port is available from GitHub.
For questions and discussions, please use the CMSIS++ section of the GNU ARM Eclipse forum.
For bugs and feature requests, please use the GitHub issues.