Skip navigation


1 2 3 Previous Next

Software Development Tools

161 posts



CMSIS++, or rather POSIX++, is a POSIX-like, portable, vendor-independent, hardware abstraction layer intended for C++/C embedded applications, designed with special consideration for the industry standard ARM Cortex-M processor series. Originally intended as a proposal for the next generation CMSIS,  CMSIS++ can probably be more accurately defined as "C++ CMSIS", and POSIX++ as "C++ POSIX".


CMSIS++ RTOS: APIs vs reference implementations


The CMSIS++ cornerstone is the RTOS, and in this respect CMSIS++ RTOS can be analysed from two perspectives: the CMSIS++ RTOS APIs, with a modern design and the CMSIS++ RTOS reference implementation with a clean and efficient code.

In the first phase of the project, the CMSIS++ RTOS APIs were designed, with POSIX threads in mind, but from a C++ point of view.

The native CMSIS++ RTOS interface is the C++ API, with a C API implemented as a wrapper, and an ISO C++ Threads API implemented also on top of the native C++ API.


The CMSIS++ RTOS C++ API as a wrapper on top of an existing RTOS


Initially, the C++ API was validated by implementing it as a wrapper on top of the popular open source project FreeRTOS. Full functionality was achieved, and the entire system passed the ARM CMSIS RTOS validation suite.


The CMSIS++ RTOS reference synchronisation objects (semaphores, queues, etc)


With the native C++ API validated, while still using the safety net provided by an existing scheduler, the next step toward a grand design was to implement, in a portable way, the synchronisation objects defined by the CMSIS++ RTOS.

The result was a highly portable implementation, that requires a very simple interaction with the scheduler, basically a thread suspend() and resume().

Using this model, all RTOS objects were implemented (semaphores, mutexes, condition variables, message queues, memory pools, event flags, clocks and timers); full functionality was achieved, and again the entire system passed the ARM CMSIS RTOS validation suite.

To be noted that in this configuration, when running on top of an existing RTOS, it is perfectly possible to select which implementation to use, at individual object level; in other words it is perfectly possible to run with some objects implemented by the host RTOS and some objects using the reference portable implementation. This is generally useful when some of the objects defined by CMSIS++ are not available in the host RTOS; for example in the current version of FreeRTOS there were no memory pools or condition variables, and these objects were supplied by the reference implementation.


The CMSIS++ RTOS reference scheduler


The last piece to complete the puzzle was the scheduler. The CMSIS++ RTOS specifications do not mandate for a specific scheduling policy, and, when running on top of an existing RTOS, any scheduling policy can be used.

However, the CMSIS++ RTOS reference scheduler takes the beaten path and implements a priority based, round robin, cooperative and optionally preemptive scheduler.

In other words, threads are assigned priorities, higher priority threads are scheduled first, equal priority threads are scheduled in a round robin way, and scheduling points are entered either explicitly at any wait() or yield(), or are optionally triggered by periodic interrupts, like the system clock ticks, or by user interrupts.


The scheduler portable code


The scheduler was designed to be as portable as possible, and to run on any reasonable architecture, with any word size.

As such, the scheduler's main responsibility is to manage the list of threads ready for execution and to switch their execution contexts in an orderly manner.

Although not mandatory for its functionality, the scheduler also keeps track of all registered threads, and provides iterators to walk these lists.

For a better modularity, the scheduler itself does not keep track of threads waiting for various events; this is delegated to the various synchronisation objects, that are expected to implement their own policy of suspending and resuming execution of threads waiting for common resources.

However, the reference synchronisation objects use similar lists to keep track of the waiting threads, and, to simplify the implementation, the scheduler provides base classes for these lists.


The scheduler port-specific code


Regardless how carefully a portable scheduler is designed and implemented, there will always be a last mile where the platform differences become important.

To accommodate for these differences, the scheduler needs to be ported on a specific platform. The port includes the specific definitions, mainly the way of creating and switching thread contexts, but also handling interrupts, accessing timers and clocks, etc.

There are currently two such CMSIS++ RTOS scheduler ports available and fully functional:

  • a 32-bits ARM thumb port, running on Cortex-M devices;
  • a 64-bits synthetic POSIX port, running as a user process on macOS and GNU/Linux.


These ports are actually not part of the CMSIS++ package itself, which is highly portable, but are part of separate µOS++ packages.


The Cortex-M scheduler port


This 32-bits ARM thumb port is specifically designed to run on Cortex-M devices. It currently supports ARMv6-M and ARMv7-M architectures, with or without FPU. Support for ARMv8-M will be added when needed.

The implementation uses the ARM specific features, like PendSV, which greatly simplify things.

For example, the context switching is performed by a rather simple function:


__attribute__ ((section(".after_vectors"), naked, used, optimize("s")))
PendSV_Handler (void)
  // The naked attribute and the push/pop are used to fully control
  // the function entry/exit code; be sure other registers are not
  // used in the assembly parts.
  asm volatile ("push {lr}");

  // The whole mystery of context switching, in one sentence. :-)
  port::scheduler::restore_from_stack (
      port::scheduler::switch_stacks (
          port::scheduler::save_on_stack ()));

  asm volatile ("pop {pc}");


Apart from saving/returning, this function does exactly what it is expected to do:

  • save_on_stack() - saves the context of the current thread on the thread stack and returns the stack address;
  • switch_stacks() - saves the above stack address in the current thread control block, selects the next thread waiting to run and returns the address of its stack context;
  • restore_from_stack() - restores the context of the new thread from the stack.


The two save/restore functions are among the very few in the Cortex-M port that require assembly code:


inline stack::element_t*
save_on_stack (void)
  register stack::element_t* sp_;

  asm volatile
      // Get the thread stack
      " mrs %[r], PSP                       \n"
      " isb                                 \n"

#if defined (__VFP_FP__) && !defined (__SOFTFP__)

      // Is the thread using the FPU context?
      " tst lr, #0x10                       \n"
      " it eq                               \n"
      // If so, push high vfp registers.
      " vstmdbeq %[r]!, {s16-s31}           \n"
      // Save the core registers r4-r11,r14.
      // Also save EXC_RETURN to be able to test
      // again this condition in the restore sequence.
      " stmdb %[r]!, {r4-r9,sl,fp,lr}       \n"


      // Save the core registers r4-r11.
      " stmdb %[r]!, {r4-r9,sl,fp}          \n"

      : [r] "=r" (sp_) /* out */
      : /* in */
      : /* clobber. DO NOT add anything here! */

  return sp_;

inline void
restore_from_stack (stack::element_t* sp)
  // Without enforcing optimisations, an intermediate variable
  // would be needed to avoid using R4, which collides with
  // the R4 in the list of ldmia.

  // register stack::element_t* sp_ asm ("r0") = sp;

  asm volatile

#if defined (__VFP_FP__) && !defined (__SOFTFP__)

      // Pop the core registers r4-r11,r14.
      // R14 contains the EXC_RETURN value
      // and is restored for the next test.
      " ldmia %[r]!, {r4-r9,sl,fp,lr}       \n"
      // Is the thread using the FPU context?
      " tst lr, #0x10                       \n"
      " it eq                               \n"
      // If so, pop the high vfp registers too.
      " vldmiaeq %[r]!, {s16-s31}           \n"


      // Pop the core registers r4-r11.
      " ldmia %[r]!, {r4-r9,sl,fp}          \n"


      // Restore the thread stack register.
      " msr PSP, %[r]                       \n"
      " isb                                 \n"

      : /* out */
      : [r] "r" (sp) /* in */
      : /* clobber. DO NOT add anything here! */


The generated code (for Cortex-M3) is remarkably neat and tidy:


08000198 <PendSV_Handler>:
8000198: b500       push {lr}
800019a: f3ef 8009 mrs r0, PSP
800019e: f3bf 8f6f isb sy
80001a2: e920 0ff0 stmdb r0!, {r4, r5, r6, r7, r8, r9, sl, fp}
80001a6: f000 fe07 bl 8000db8 <os::rtos::port::scheduler::switch_stacks(unsigned long*)>
80001aa: e8b0 0ff0 ldmia.w r0!, {r4, r5, r6, r7, r8, r9, sl, fp}
80001ae: f380 8809 msr PSP, r0
80001b2: f3bf 8f6f isb sy
80001b6: bd00       pop {pc}


Static vs dynamic memory allocation


One of the initial CMSIS++ RTOS design requirements was to give the user full control over the memory allocation.

The implementation fulfilled this requirement, allowing any possible memory allocation scheme, from the simplicity of using fully static allocation to the extreme of using separate custom allocators for each object requiring dynamic memory.


The objects requiring dynamic memory are:

  • threads, for the stacks
  • message queues, for the queues (arrays of messages)
  • memory pools, for the pools (arrays of blocks)


All these objects have a last allocator parameter in their constructors that defaults to the system allocator memory::allocator<T>.

For example one of the thread constructors is:


using Allocator = memory::allocator<stack::allocation_element_t>;

thread (const char* name, func_t function, func_args_t args,
        const attributes& attr = initializer, const Allocator& allocator =
              Allocator ());


By default the memory::allocator<T> is defined as:


template<typename T>
  using allocator = new_delete_allocator<T>;


but the user can define it as any standard C++ allocator, and so the behaviour of all objects requiring dynamic memory can be customised at once.

Even more, each such object has a separate template version, that takes a last allocator parameter, so at the limit each such object can be allocated using a separate allocator.

Given the magic of C++, using such allocators is straightforward:


template<typename T>
  class my_allocator;

thread_allocated<my_allocator> thread { "th", func, nullptr };

message_queue_allocated<my_allocator> queue1 { "q1", 7, sizeof(msg_t) };
message_queue_typed<msg_t, my_allocator> queue2 { "q2", 7 };

memory_pool_allocated<my_allocator> pool1 { "p1", 7, sizeof(blk_t) };
memory_pool_typed<blk_t, my_allocator> pool2 { "p2", 7 };


Static allocation is handled using exactly the same method, but different templates:


thread_static<2500> thread { "th", func, nullptr };

message_queue_static<7, msg_t> queue { "q" };

memory_pool_static<7, blk_t> pool { "p" };




Writing RTOS unit tests was always tricky and the results debatable. This does not mean it should not be attempted; actually, if done properly, these tests can be very useful.

To improve testability, the synthetic POSIX platform was implemented. It allows to run most RTOS tests within a very convenient environment like macOS or GNU/Linux.

Another greatly helpful tool used to run the RTOS tests is the GNU ARM Eclipse QEMU, which emulates the STM32F4DISCOVERY board well enough for most tests to be relevant.

Actually most of the times the tests were performed either on macOS or on QEMU, and only rarely, usually at the end, as a final validation, the tests were also performed on physical hardware.


The ARM CMSIS RTOS validation suite


The main test was the ARM CMSIS RTOS validation suite, that exercises quite thoroughly the interface published in the cmsis_os.h file.

This test is automatically performed by the test scripts on the STM32F4DISCOVERY board running under GNU ARM Eclipse QEMU and on the synthetic POSIX platform.

The result of a run is:


CMSIS-RTOS Test Suite   Jun 23 2016   16:03:42

TEST 01: TC_ThreadCreate                  PASSED
TEST 02: TC_ThreadMultiInstance           PASSED
TEST 03: TC_ThreadTerminate               PASSED
TEST 04: TC_ThreadRestart                 PASSED
TEST 05: TC_ThreadGetId                   PASSED
TEST 06: TC_ThreadPriority                PASSED
TEST 07: TC_ThreadPriorityExec            PASSED
TEST 08: TC_ThreadChainedCreate           PASSED
TEST 09: TC_ThreadYield                   PASSED
TEST 10: TC_ThreadParam                   PASSED
TEST 11: TC_ThreadInterrupts              PASSED
TEST 12: TC_GenWaitBasic                  PASSED
TEST 13: TC_GenWaitInterrupts             PASSED
TEST 14: TC_TimerOneShot                  PASSED
TEST 15: TC_TimerPeriodic                 PASSED
TEST 16: TC_TimerParam                    PASSED
TEST 17: TC_TimerInterrupts               PASSED
TEST 18: TC_SignalMainThread              PASSED
TEST 19: TC_SignalChildThread             PASSED
TEST 20: TC_SignalChildToParent           PASSED
TEST 21: TC_SignalChildToChild            PASSED
TEST 22: TC_SignalWaitTimeout             PASSED
TEST 23: TC_SignalParam                   PASSED
TEST 24: TC_SignalInterrupts              PASSED
TEST 25: TC_SemaphoreCreateAndDelete      PASSED
TEST 26: TC_SemaphoreObtainCounting       PASSED
TEST 27: TC_SemaphoreObtainBinary         PASSED
TEST 28: TC_SemaphoreWaitForBinary        PASSED
TEST 29: TC_SemaphoreWaitForCounting      PASSED
TEST 30: TC_SemaphoreZeroCount            PASSED
TEST 31: TC_SemaphoreWaitTimeout          PASSED
TEST 32: TC_SemParam                      PASSED
TEST 33: TC_SemInterrupts                 PASSED
TEST 34: TC_MutexBasic                    PASSED
TEST 35: TC_MutexTimeout                  PASSED
TEST 36: TC_MutexNestedAcquire            PASSED
TEST 37: TC_MutexPriorityInversion        PASSED
TEST 38: TC_MutexOwnership                PASSED
TEST 39: TC_MutexParam                    PASSED
TEST 40: TC_MutexInterrupts               PASSED
TEST 41: TC_MemPoolAllocAndFree           PASSED
TEST 42: TC_MemPoolAllocAndFreeComb       PASSED
TEST 43: TC_MemPoolZeroInit               PASSED
TEST 44: TC_MemPoolParam                  PASSED
TEST 45: TC_MemPoolInterrupts             PASSED
TEST 46: TC_MsgQBasic                     PASSED
TEST 47: TC_MsgQWait                      PASSED
TEST 48: TC_MsgQParam                     PASSED
TEST 49: TC_MsgQInterrupts                PASSED
TEST 50: TC_MsgFromThreadToISR            PASSED
TEST 51: TC_MsgFromISRToThread            PASSED
TEST 52: TC_MailAlloc                     PASSED
TEST 53: TC_MailCAlloc                    PASSED
TEST 54: TC_MailToThread                  PASSED
TEST 55: TC_MailFromThread                PASSED
TEST 56: TC_MailTimeout                   PASSED
TEST 57: TC_MailParam                     PASSED
TEST 58: TC_MailInterrupts                PASSED
TEST 59: TC_MailFromThreadToISR           PASSED
TEST 60: TC_MailFromISRToThread           PASSED

Test Summary: 60 Tests, 60 Executed, 60 Passed, 0 Failed, 0 Warnings.
Test Result: PASSED


The mutex stress test


This test exercises the scheduler and the thread synchronisation primitives. It creates 10 threads that compete for a mutex, simulate random activities and compute statistics on how many times each thread acquired the mutex, to validate the fairness of the scheduler.

The test is automatically performed by the scripts on the STM32F4DISCOVERY board running under GNU ARM Eclipse QEMU and on the synthetic POSIX platform.

A typical result of the run is:


Mutex stress & uniformity test.
Built with GCC 5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589].
Seed 3761791254
[  5s] t0:39   t1:42   t2:37   t3:41   t4:38   t5:37   t6:36   t7:41   t8:40   t9:34   sum=385, avg=39, delta in [-5,3] [-12%,8%]
[ 10s] t0:74   t1:82   t2:79   t3:84   t4:79   t5:84   t6:77   t7:76   t8:80   t9:75   sum=790, avg=79, delta in [-5,5] [-5%,6%]
[ 15s] t0:114  t1:120  t2:116  t3:128  t4:117  t5:122  t6:114  t7:116  t8:116  t9:115  sum=1178, avg=118, delta in [-4,10] [-2%,8%]
[ 20s] t0:155  t1:161  t2:152  t3:163  t4:153  t5:160  t6:154  t7:159  t8:154  t9:154  sum=1565, avg=157, delta in [-5,6] [-2%,4%]
[ 25s] t0:196  t1:199  t2:194  t3:206  t4:193  t5:198  t6:194  t7:200  t8:197  t9:194  sum=1971, avg=197, delta in [-4,9] [-1%,5%]
[ 30s] t0:233  t1:236  t2:241  t3:245  t4:231  t5:236  t6:233  t7:237  t8:234  t9:237  sum=2363, avg=236, delta in [-5,9] [-1%,4%]
[ 35s] t0:270  t1:281  t2:277  t3:284  t4:266  t5:273  t6:279  t7:278  t8:273  t9:277  sum=2758, avg=276, delta in [-10,8] [-3%,3%]


The semaphore stress test


This test exercises the synchronisation primitives available from interrupt service routines and the effectiveness of the critical sections. It creates a high frequency hardware timer which posts to a semaphore, and a thread counts if the posts arrived in time or were late, in other words if the scheduler was or not able to wakeup the thread fast enough.

The test runs on the physical STM32F4DISCOVERY board.

A typical result of the run shows that on this platform the scheduler can stand about 250.000 context switches per second:


Semaphore stress test.
Built with GCC 5.3.1 20160307 (release) [ARM/embedded-5-branch revision 234589].

Iteration 0
Seed 832262406
  42000 cy    1 kHz
  21000 cy    2 kHz
  10500 cy    4 kHz
   5250 cy    8 kHz
   2625 cy   16 kHz
   1312 cy   32 kHz
    656 cy   64 kHz
    328 cy  128 kHz
    164 cy  256 kHz    1 late
     82 cy  512 kHz  777 late
     41 cy 1024 kHz  998 late
     20 cy 2100 kHz  999 late
     10 cy 4200 kHz  999 late




CMSIS++ is still a young project, and many things need to be addressed, but the core component, the RTOS, is pretty well defined and functional.

For now it may not be perfect (as it tries to be), but it definitely provides a more standard set of primitives, closer to POSIX, and a wider set of APIs than many other existing RTOSes, covering both C++ and C applications; at the same time it does its best to preserve compatibility with the original ARM CMSIS APIs.

Any contributions to improve CMSIS++ will be highly appreciated.


More info


CMSIS++ is an open source project, maintained by Liviu Ionescu. The project is released under the terms of the MIT license.

The main source of information for CMSIS++ is the project web.

The Git repositories and all public releases are available from GitHub; specifically the stress tests are available from the tests folder.

The code for ARM CMSIS RTOS validator is available from GitHub.

The code for the Cortex-M scheduler port is available from GitHub.

The code for the synthetic POSIX scheduler port is available from GitHub.

For questions and discussions, please use the CMSIS++ section of the GNU ARM Eclipse forum.

For bugs and feature requests, please use the GitHub issues.

In previous blogs we covered an introduction to System Trace Macrocell (STM) concepts and terminology, and the STM Programmers' model with an example of how to generate efficient trace data. Once the STM is generating a trace stream, we may wish to view it within our Debugger.


DS-5 implements an "Events View" which serves this purpose.



Configuring Your Target


First, it is necessary to make sure that the platform configuration for your target is configured (via DTSL options) to collect trace from the STM, otherwise the view will not be configurable. From the Debug Configurations user interface, we can find the DTSL Options "Edit..." button underneath the target selection list.


Each platform may look slightly different. First, select a valid trace sink via the "Trace Buffer" tab - most platforms default to "None" and may have many options such as "DSTREAM" or "ETB."


There is usually a dialog tab marked "STM" or a checkbox which enables trace from a particular STM, per the following screenshot:




Configure the Events view


Once connected we can configure our Events view. By default, it looks fairly empty. This view must be configured for each Master and Channel combination we want to see in the view. We see an informational item on what the view will decode (which Masters and Channels) and the source (in this case, DSTREAM: STM).


The view is organized in pages, and the VCR-like controls will walk us back and forth within the decoded trace:


To configure the view, find the Settings menu (next to the view minimize/maximize buttons) and select the "Events Settings..." item.


We will then be presented with a dialog. First, select the trace source to be shown in the view. In the example we show collecting trace on the DSTREAM unit (via TPIU) and that we want to see the trace output from device "STM." This makes up the "DSTREAM: STM" trace configuration.


For each Master, a Channel can be defined, and the expected decode of that channel further changed from "Text" to "Binary." We see that we are enabling Master 64 and Channel 0 as Text and channel 1-65535 as Binary. The example code provided only uses Channel 0 and Channel 1, but here we see that we can have a different setting for each master and each channel.

The mapping of Master number to a source device is implementation-specific. For the Juno ARM Development Platform, it is listed in the SoC Technical Reference Manual (specifically for r0, r1, and r2).



Note the Import and Export buttons, which can be used to load in a pre-configured set of configurations, or save them out for later re-use, as different system environments and applications will have different settings.


Viewing Trace Output


Once we've collected trace, we will see the STM output in the Events view. Notice the Master and Channel are reported, the Timestamp increments.


We see, from our example code, our "Cambridge" string (the first character 'C' is Marked) and our Prime number and count following:



In this blog, the second in a series, we explore the programmers' model for the ARM System Trace Macrocell. A previous blog covered basic concepts of the STM architecture and implementation. Example code is provided, which is minimally targeted at the Juno ARM Development Platform.


STM Programmers’ Model


Memory Map


The STM Architecture defines a memory map that is split into two regions; a configuration interface (4KiB in size) which contains all the registers used to configure the behavior of the STM, as well as access Basic Stimulus Ports, if implemented.


A second region of memory contains the Extended Stimulus Ports and can be up to 16MiB in size. How this is represented in the system memory map is down to the design of the SoC -- all Masters (CPUs and devices) may access the same address, or all Masters may access a dedicated and independent address.


All registers in the STM Architecture are defined as being located at an offset relative to the base address of their constituent region. On the Juno SoC, the base address of the configuration (or "APB") interface is 0x2010_0000 and the based address of the Extended Stimulus (or "AXI") region is 0x2800_0000, with this address being common to all Masters.




There are two key steps to configuring the STM via the APB interface. The first is that the STM needs to be configured with a valid Trace ID, since it outputs the instrumentation data over the CoreSight trace subsystem.


This value is exported over the ATB bus interface and is required not only for the transactions to be valid, but to discern between STM trace data and, for example, trace data from another CoreSight component such as an Embedded Trace Macrocell (ETM).


When using an external debugger (such as ARM DS-5) to collect the trace, it is possible to have the debugger set up the Trace ID as part of the connection sequence. The responsibility for this truly depends on your use case; if an external debugger is involved then it may be configuring other CoreSight components and giving them Trace IDs. You do not want the STM Trace ID and the Trace ID for another component to be the same, but you also do not want the debugger to conflict with your application STM configuration.


If you have an external debugger connected you can modify your instrumentation software to compensate; there is no harm whatsoever in having the debugger set the same trace ID as your instrumentation software.


We show an example function stmTRACEID() which performs this operation:


 * stmTRACEID(stm, traceid)
 * Set STM's TRACEID (which goes out over ATB bus ATBID)
 * Note it is illegal per CoreSight to set the trace ID
 * to 0x00 or one of the reserved values (0x70 onwards)
 * (see IHI0029D D4.2.4 Special trace source IDs).
unsigned int stmTRACEID(struct STM *stm, unsigned int traceid)
  if ((traceid > 0x00) && (traceid < 0x70)) {
    unsigned int tcsr;

    traceid = traceid & TRACEID_MASK;

    tcsr = (stm->APB->STMTCSR & ~(TRACEID_MASK << TRACEID_SHIFT));
    stm->APB->STMTCSR = (tcsr | (traceid << TRACEID_SHIFT));

    return traceid;

  return 0;


The second requirement is to enable the stimulus ports in question. This is actually an optional part of STM Architecture that offers configuration registers to enable and disable the generation of trace packets when a particular stimulus port is accessed. It is possible to enable and disable stimulus ports with a certain granularity, but this will be completely dependent on the design of the instrumented code and the system it runs on. This example code enables all Extended stimulus ports such that any stimulus write to any stimulus port will generate a packet.


 * Set STMPSCR.PORTCTL to 0x0 to ensure port selection is not
 * used. STMPSCR.PORTSEL is ignored and STMSPER and STMSPTER
 * bits apply equally to all groups of ports.
 * Whether the STM has 32 or 65536 ports, they'll all be
 * enabled.
stm->APB->STMSPSCR = 0x00000000;
stm->APB->STMSPER = 0xffffffff;
stm->APB->STMSPTER = 0xffffffff;


Once configured, we can then enable the STM with appropriate register access:




This is the bare minimum setup for an STM. There are obviously other configuration options such as Compression, Timestamping, and Synchronization that may or may not be configured dependent on the application.


Which Stimulus Port?


Each of the 65536 possible Extended Stimulus Ports maps to an STPv2 Channel. A trace decoder can then look for trace belonging to this channel to retrieve the instrumentation and differentiate it from other instrumentation sources.


The layout in memory of the stimulus ports means that for each packet, a data item is written to a particular address and offset within the STM stimulus port address space. Recall that each Extended Stimulus Port is a 256-byte region of memory. The address of the start of the stimulus port, and therefore all the registers which will generate trace for that "channel" within the AXI interface, can be calculated.


channel_address  = STM_AXI_BASE + (0x100 * channel_number)


We present code which provides two examples of access methods, the first using logical operations to exploit defined address decode logic within the STM Architecture, and return a pointer which can be used to perform the memory write.

The finer points of the address decode used by the STM is documented in the STM Architecture, section 3.3. The code for stm.c:stmPortAddress() in the example code shows a method of calculating the address and offset using a flag-based API.

The second uses a C struct defining the layout of each stimulus port offset as an array. In this manner, assigning a value to a particular structure member would generate the appropriate store. Additionally, using C macros can simplify and increase readability of the actual stimulus port access.


struct stmPort {
  STM_NA G_reserved[16];


  STM_NA I_reserved[16];


 * STM AXI Stimulus Interface
 * The STM Architecture defines up to 65536 stimulus ports, all of which are
 * implemented on the STM and STM-500 from ARM, Ltd.
struct stmAXI {
     * access the port array based on the limit in
     * (stmAPB->STMDEVID & 0x1fff) so nothing we
     * can define at compile time..
    struct stmPort port[0];

 * STMn(port, class)
 * Write an n-byte value to a stimulus port of a particular type (e.g. G_DMTS)
#define STM8(a, p, type)  *((volatile unsigned char *) &((a)->port[p].type))
#define STM16(a, p, type) *((volatile unsigned short *) &((a)->port[p].type))
#define STM32(a, p, type) *((volatile unsigned int *) &((a)->port[p].type))
#define STM64(a, p, type) *((volatile unsigned long *) &((a)->port[p].type))


We can re-create "printf debug" functionality by passing formatted strings to a function which outputs them as data over the requested STM channel:

The example function stm.c:stmSendString() outputs a string as instrumentation using macros STMn() (where n is {8,16,32,64}) which resolve to a C struct access as defined above.

 * void stmSendString(stm, channel, string)
 * We specifically write a byte to ensure that we get a D8 packet,
 * although that limits the function to 8-bit encodings.
 * It doesn't matter what we use for the last write (if we see
 * a null character) -- G_FLAGTS has no data except the flag and
 * the timestamp, so a 32-bit access will be just fine..

void stmSendString(struct STM *stm, unsigned int channel, const char *string)
     * Send a string to the STM extended stimulus registers
     * The first character goes out as D8M (Marker) packet
     * The last character is followed by a Timestamp packet
     * This is the Annex C example from the STPv2 spec
    struct stmAXI *axi = stm->AXI;

    int first = 1;

    while(*string != '\0')
    {        /*
         * If the character is a linefeed, then don't output
         * it -- just reset our 'first' state to 1 so that
         * the next character (the start of the next line)
         * is marked
        if (*string == '\n') {
            STM32(axi, channel, G_FLAGTS) = *string++;
            first = 1;
        } else {
             * Continue to output characters -- if it's the
             * first character in a string, or just after a
             * linefeed (handled above), mark it.
            if (first) {
                STM8(axi, channel, G_DM) = (*string++);
                first = 0;
            } else {
                STM8(axi, channel, G_D) = (*string++);

     * Flag the end of the string
     * Access size doesn't matter as we have no data for flag
     * packets
    STM32(axi, channel, G_FLAGTS) = 0x0;


Effective use of the STM

Annex C of the STPv2 specification gives an example of encoding an ASCII string as a data item, and uses metadata functionality of the extended stimulus ports. Strings are delimited with a Marked packet at the start of the string, and the end each string is appended with a FLAG_TS packet, in place or in lieu of a linefeed or NUL character. For one type of Marked Data packet is 0x08 (G_DM). For a (plain) Data packet, 0x18 (G_D), and for a Flag packet with Timestamp, 0x60 (G_FLAGTS), so we can break down sending the string as individual writes to those addresses. When we look at the trace output for a NUL-terminated string “Cambridge”, we might expect to see the following in the trace stream following this example, as a result of those writes.



This allows a trace decoder to adequately identify individual lines within text output, and additionally gives the trace decoder a method of determining when the string was output in time by way of the Timestamp. For binary data, a similar construct may be used with Marked data or Flag metadata surrounding the elements of an instrumentation message.


It might become obvious that outputting ASCII strings over a trace bus with a single packet per character is possibly not the most efficient way to use the STM. Since each data item is encapsulated in the STPv2 protocol, there is some overhead. The example string "Cambridge" sent as D8 packets and surrounded by D8M and FLAG_TS could be, rather than 9 bytes long (1 byte per character), somewhat more than 20 bytes. Packet headers are easily accounted for, but a timestamp may be quite large (up to 7 bytes, not inclusive of the FLAG_TS packet header) and may vary in size. This also does not take into account reporting of Channel and Master information. There are many ways of encoding a string within larger packet types using marker and flag 'framing' to differentiate between strings, but in the end "printf", whether over a USART or an STM interface, is simply not an efficient method of instrumentation.


In fact, in industrial applications, instrumentation is usually binary data formatted to be compact and useful and not a console output. This is especially true of use cases such as the network packet processing instrumentation where the relevant data needn't be prefixed or human readable, and indeed may be far too vast for a human to spend time reading -- the point of said instrumentation would be statistical analysis.


The onus, therefore, is upon the trace decoder to make sense of that packetized binary data. With any instrumentation data, an appropriate format for that data can be designed – ASCII strings or binary structures – and this will very much inform how the Stimulus Ports are used. Simply, you will need to at least define the usage of channels and the metadata packets before you start writing instrumentation code. By modulating the access size and the use of the extended stimulus ports' abilities to add metadata, extremely efficient output of binary instrumentation data can be effected.


Annex C also gives an example of formatting binary data in such a manner that can be constructed using the stimulus port accessor methods (as previously described). Let us imagine an application which calculates prime numbers. When it finds a prime number, it outputs the prime number itself, and the position or index of the prime, as 32-bit stimulus accesses to the STM. For example, 41 is the 13th prime number, so it outputs "41" and "13."


Stimulus Port RegisterData


A trace decoder can then look for pairs of 32-bit data items, with the second followed by a Marker packet augmented with a Timestamp. From the difference in timestamps between packets, we could work out how long it took to generate that prime number.




This takes up six 32-bit words (24 data bytes) not including overhead for the 3 shown sets of data. Unless our first prime number very, very large, we would not need to encode the number or the count in a 32-bit data packet. Since each value is packetized independently (the STM will never merge two packets), the accessor could be conditional on the size (counting leading zeros) of the output data or could be automatically emitted as a smaller packet using optional STM Compression features.


The trace decoder would then be able to still look for pairs of data packets (with a Marker+Timestamp) but we would have more efficient usage of bits in the resultant trace. Below we show how an efficient trace output could be achieved counting primes, where increase the packet payload size as we reach the limit of the previous type (again, the first field is a "prime," the second marked field is a "count" of which prime). To collect the data below showing reporting of 5 sets of data, using 15 data bytes (again, not including overhead).




We can see that since the first prime can be encoded in 8 bits, we can use a D8 packet. Since it's position can be encoded as 8 bits, we can also use a D8 packet. The next prime is 257, which requires >8 bits to encode, but the position does not, so we see D16+D8MTS. And so on. Eventually we will see D32 and possibly D64 packets if we calculate enough primes, but only if we need that number of bits to encode the value.




We now know fundamentally how to program the STM and generate stimulus which implements out instrumentation. Next we'll discuss how to configure DS-5 to collect the instrumentation as Trace, in Configuring DS-5 for the System Trace Macrocell.

This article aims to introduce the ARM System Trace Macrocell (STM), outlining what it is, its basic operation, and why one might want to use it. Example code will be provided, minimally targeted at the Juno ARM Development Platform, in a later blog in the series.



Introduction to instrumentation

When writing code it is often useful to add informational statements that give an insight into control flow and data management as well as aiding in observation of the actual code at runtime. As such, instrumentation is an important component of code running on a live system. The proliferation of "printf" debug statements, whereby data is output to a console, is testament to this.


Sending text data to a USART or similar peripheral via printf is perhaps the most common method of instrumentation. It does have its drawbacks; the data rate of most USARTs are usually relatively low and at the same time the overhead of maintaining such communication is relatively high, involving the use of FIFOs and interrupt servicing. It is also sometimes complicated to access a serial port connection on a production system, which may be located remotely. With this in mind, the use of a USART for instrumentation can be considered non-ideal choice for use cases involving high-performance code or the collection of remote instrumentation data.


An alternative method may be to use network devices, such as Ethernet. These devices typically afford much higher bandwidth rates than USARTs, and are ideal for the collection of remote data. However, this does involve manually encapsulating the data in protocols such as TCP/IP, which can dramatically increase the overhead of servicing the peripheral. Therefore the overhead of instrumentation can be higher.


Using USARTs, Ethernet or other generic data peripherals can have detrimental effects on instrumented code. As an example, we can imagine a system which performs network packet data processing. If we consider using a USART then we may find that the data processing is limited because the overhead of sending instrumentation data is limited by the USART bandwidth. If we then consider that we then use Ethernet as a transport for instrumentation, we might find that the instrumentation on packet data processing contains data on the process of instrumentation itself.


It is considered desirable for instrumented code to run at close to the performance and run-time profile of non-instrumented code. That has the implication that instrumentation has as little management overhead as possible, and does not markedly interfere with operation of the non-instrumentation code. One way to solve these problems is with a device which is designed for the purpose of instrumentation.



The System Trace Macrocell

A System Trace Macrocell (STM) grants software developers the ability to instrument code utilizing the CoreSight Trace subsystem as a transport. CoreSight is a central part of most ARM SoCs, and is intended to operate at the similar clock rates as the rest of the components of the system. The STM itself operates in a non-invasive fashion requires very little overhead besides memory-mapped peripheral writes, and does not (directly) generate interrupts.


ARM defines a System Trace Macrocell Programmers' Model Architecture Specification (currently version 1.1, referred to here as "STM Architecture") and licenses the current CoreSight STM-500 product as implementation of that architecture.


Further information on CoreSight Trace can be found in Eoin McCann's 3-part blog on CoreSight.


The STM instruments using the MIPI System Trace Protocol version 2.0 (STPv2), which is available to MIPI Members. The protocol itself defines a method for both instrumentation data and metadata to be encapsulated in a trace stream, composed of varying sized data elements (from 4- to 64-bit). The instrumentation is otherwise free-form and neither the protocol nor the STM place any limitations on the data content of the stream. These aspects of the STM free the software developer from having to be concerned with instrumentation overheads and available bandwidth.






Instrumentation via STM can be identified as being output via a particular "Master," in order to differentiate the various sources within a system. A simple implementation might attribute all instrumentation with a single Master identifier. A more complex design might attribute each individual core with a unique Master identifier, making it clear which core was running the software was responsible for generating a particular datum of instrumentation.


Any device that can generate a memory system write can generate instrumentation, for example DMA peripherals and GPUs.


The number of masters within a system and their identifiers are part of the implementation of the system, and may or may not map directly to, for example, AXI IDs. Check the design documentation for your chosen SoC for details on which components are able to generate stimulus via memory writes, and what their STPv2 Master ID is.




Each STM implementation has access to up to 65536 instrumentation channels. Each of these channels is clearly defined in the trace stream, allowing for multiple types of instrumentation to be intermixed within a single system or single application. For instance, channel 0 could be used to encode ASCII text, while channel 10 could output packet headers in a binary format.  Alternatively, one channel could be allocated to each Process within a system.



Metadata: Marks, Flags, Timestamps and Triggers


STM metadata is highly flexible, allowing one to arbitrarily Mark any trace data packet. A marked datum is typically used to identify the start of data or something of interest in the trace stream. A Flag can be used in a similar way; however, no data is associated with a Flag.

Each packet can be supplemented with a Timestamp, which takes an external clock signal and converts it into an incrementing count in the trace stream. In this manner a trace stream can be synchronized with other trace in the system, such as Instruction Trace from an ETM, or simply allow timing information to the trace decoder.


STPv2 defines the format of the timestamp to be flexible. The STM-500 outputs timestamps in a natural binary format, with the ability to encode a delta to conserve bandwidth.

A Trigger is special as they are both output to the trace stream and can have an effect on the rest of the trace subsystem. The result of a Trigger can be routed to other components in the system. In this manner code can be instrumented and also generate additional trace from other Trace Macrocells within the system at pertinent points. This is particularly useful for post-mortem analysis use cases.




Stimulus Ports

Channels are formed on the STM by way of “stimulus ports.” These are groups of registers within the SoC memory map that, when accessed, generate the desired trace output. The STM Architecture defines both “Basic” and “Extended” Stimulus Ports. A Basic Stimulus Port is simple; data is written to the port, and that data is then output.


Extended Stimulus Ports allows for the augmentation of the data with useful metadata, along with the importance of that data (Guaranteed or Invariant, discussed later). The Extended Stimulus Ports consist of a grouping of 16 registers in a 256-byte  memory mapped region, separate from the STM configuration registers.


Depending on the address offset of the register within a group, a different STPv2 packet is output. The offsets are defined in the STM Architecture, Section 3.1 (Table 3-1), a summary of which is shown:



Address OffsetShort nameDescription
0x00G_DMTSData, marked with timestamp, guaranteed
0x08G_DMData, marked, guaranteed
0x10G_DTSData, with timestamp, guaranteed
0x18G_DData, guaranteed
0x60G_FLAGTSFlag with timestamp, guaranteed
0x68G_FLAGFlag, guaranteed
0x70G_TRIGTSTrigger with timestamp, guaranteed
0x78G_TRIGTrigger, guaranteed


The size of the data payload of each packet is determined by the size of the access made to the stimulus port offset. For example, an 8-bit store to offset 0x18 would nominally generate a 'D8' packet, while a 32-bit store to offset 0x18 would nominally generate a 'D32' packet, and so on.


To reiterate, we can "Mark" and "Timestamp" our data, and also output metadata only via "Flag" and "Trigger" mechanisms (these types of instrumentation have no data payload.)


Since ARM's STM and STM-500 IP do not implement the Basic Stimulus registers, we will not cover them here. ARM partners implementing an STM may choose to implement them per the STM Architecture. If, when designing an SoC, there is a requirement for more simple instrumentation, then it is possible that an Instrumentation Trace Macrocell (ITM) could be implemented which can provide similar functionality, although with a different programmers' model and trace output format. Please check your SoC documentation.

Fundamental Data Size


The STM implementation defines a “Fundamental Data Size.” This is essentially the maximum size of an access to the stimulus port registers, as determined by the implementation of the connection between the STM and the rest of the system.


For STM-500, as implemented in revision r1 of the Juno SoC, the fundamental data size is 64-bit, so a 64-bit stimulus should generate a D64 packet. Care should be taken to realize this value as it can change the way a trace decoder is written for application instrumentation that may run on multiple platforms.


Some SoCs implement an earlier version of the STM, the r0 revision of Juno being one example. The Fundamental Data Size is defined as 32-bit for that implementation.


An STM with a Fundamental Data Size of 64 bits may also be connected in such a way that it does not have a 64-bit wide data path, for example there may be a 'downsizer' between the instrumentation source and STM.


If a 64-bit memory system write is performed and either of the above are true, the actual trace output behavior is undefined by the STM architecture. Care should be taken to ensure these aspects are taken into account as it can change the way extracting instrumentation is performed within a trace decoder.

Guaranteed and Invariant Stimulus


The STM Architecture specifies two types of transaction, accessible through the stimulus port interface at separate offsets within the port – Guaranteed and Invariant. A write to the stimulus port "guaranteed" registers must be emitted by the STM as a trace packet; additionally, if a timestamp is requested (DnTS, FLAG_TS) and timestamping is enabled in the STM configuration registers (STMTCSR), then the timestamp will be generated.


Writes to the Invariant registers allow the STM to make a determination as to whether the full scope of instrumentation will be output. This is useful for instrumentation types that may be implemented as “lossy” – for instance, the output of the state of a loop counter where intermediate loop counts can be inferred, or where timestamping is not fundamental to the instrumentation. Invariant stimulus may, when emitted, "drop" timestamps for the sake of trace bandwidth. Important instrumentation – for instance, an error or other pertinent instrumentation, may still use Guaranteed stimulus.




Now that we have a good idea of what the STM is and how the architecture is defined, we can use the STM to generate instrumentation by Programming ARM's System Trace Macrocell, the second part of this blog series.

Sensors Expo:  Sensors and ARM Cortex Processors: Working Together:

Sensors are present in many electronic devices.  Sensors capture a wide variety of signals and then this information needs to be collected by a microprocessor for processing and then further passed to the ultimate use or application.

Data acquired by a sensor is normally transferred as an analog or digital signal to a microprocessor.   The transfer speed can be critical to prevent overruns or lost data thus requiring fast processors capable of processing data quickly within tight time frames or windows.  Such transfers can be polled or interrupt driven as desired.

Digital signal transfers can include protocols such as UART, CAN, I2C, I2S, SPI and parallel.  Analog signals usually use an A/D convertor.  These peripherals usually reside inside the microprocessor or as external ICs.  The possibilities are nearly endless offering great flexibility in your design..

Once the microprocessor has the data it is often desirable to process the data in some way.  This can include filtering, scaling or for more sophisticated applications: Digital Signal Processing (DSP).

Processor Features:  ARM Cortex processors have become the de facto standard for sensor data acquisition and processing.  ARM free DSP libraries run on all Cortex-M processors from Cortex-M0 through Cortex-M7.  ARM Cortex-M4 and Cortex-M7 processors are especially useful with DSP extensions such as MAC (multiply-accumulate), SIMD (Single Instruction Multiple Data) and various other DSP instructions.  Third party suppliers offer DSP libraries for Cortex-A series.  Cortex-A series offer the NEON DSP extension.  A FPU (Floating Point Unit) is available on many Cortex processors.

Interrupt Controller:  The Cortex-M NVIC (Nested Vector Interrupt Controller) provides a flexible and versatile interrupt and exception handling mechanism.  Individual peripherals and GPIO pins can have their own interrupt vector which provides fast response times.  The NVIC is easy to configure using the CMSIS-Core standard APIs.

ARM Processor Types:  Different sensor applications can have very different processing requirements.  Ranging from the tiny 12,000 gate Cortex-M0, through the M3, M4 and M7 series through the real-time Cortex-R family to the powerful Cortex-A family, there is an ARM processor scalable for every sensor application.  Migrating up and down the ARM roadmap to choose the most applicable processor for your application is easy.  Each can run various operating systems or none at all (bare metal). Using a RTOS has definite advantages that make your project easier to design, understand and debug.

Various ARM licensees such as Atmel, NXP, STMicroelectronics, Cypress and many others offer Cortex processors with many peripherals to transfer data to and from sensors and some sensors have integrated an ARM processor on the same silicon as their sensors for efficiency and low cost solutions.

Debugging:  Cortex-M processors include many debug features to facilitate faster software development.  Serial Wire Viewer (SWV) is a component of ARM CoreSight debugging technology that can be used to display sensor data values graphically while the processor runs. SWV is non-intrusive and is easy to use. SWV also displays exceptions and interrupts in real-time and updates while the program runs.  Many quality debuggers feature SWV operation.  This makes software development easier and faster.

ETM trace (Embedded Trace Macrocell) instruction trace displays the instructions executed and also provides Performance Analysis and Code Coverage.  Many Cortex processors have ETM.

Low power: ARM processors are legendary for their low power consumption.  In situations where power is important/critical and data events are not occurring, you can utilize various sleep modes.  An interrupt generated by an external event or peripheral can be used to "wake-up" the processor, have it perform desired objectives and then put it back to sleep with the WFI(); "wait for interrupt" or the WFE() (wait for event) instructions.

Sensors Expo:  At the ARM/Keil booth at Sensors Expo we will display a wide variety of low cost evaluation boards from many manufacturers. Most of these contain interesting sensors.  We have demonstrations of working systems using the Keil MDK toolchain.  These are mostly turn-key "out-of-the-box" systems that you will get running in a short time.  You can use the free evaluation version (to 30 K) of Keil MDK with these boards.

We can explain the various ARM Cortex processors and their relation to each other.  Upcoming technologies such as the recently announced ARM v8-M architecture that provides needed security for IoT by using ARM TrustZone technology can be explained. We will have copies of the ARM Roadmap and a limited number of the famous Keil mouse pads to give away.

Sensors Expo:  June 21-23, 2016. McEnery Convention Center, San Jose, California

This blog is an update of one I wrote a couple of years ago, referencing the latest FVP models provided with DS-5 (v5.24 at time of writing) and the latest pre-built Linaro distributions. It is intended for users new to DS-5 and/or users on Windows platforms, as the Linaro distributions assume a Linux host. Note that the pre-built images do not contain kernel debug information. If you wish to enable kernel awareness, you will need to rebuild appropriately. Application debug and other Linux aware features do not require this.


You should first download the appropriate pre-built software stack and file system to match your needs. For the below I downloaded and the OpenEmbedded LAMP filesystem. Unzip these files to your host machine.


Open the DS-5 Eclipse GUI, and select Run → Debug Configurations, to set up the debug session. Select DS-5 Debugger from the list on the left hand side, and click on New launch configuration. You can name this configuration to anything suitable. Locate the Base_AEMv8Ax1 (or Base_AEMv8x4) FVP (use Filter platforms text box to help), and drill down to the Debug ARMAEMv8-A level. If you have kernel debug symbols available I recommend selecting from the Linux Kernel and/or Device Driver Debug branch (more on this later).



We now need to use the Model parameters to instantiate the model appropriately for the Linaro images. Within the packages you downloaded above, you will find a script which is a Linux host script for launching this model stand alone with these files. We will use this as the basis for the parameters that DS-5 will pass. You can simply copy and paste the below to a text editor, fix the paths to the appropriate files to match their location on your host, to then paste to the Model parameters field. For more information on these options, see the FVP documentation.


--parameter bp.secure_memory=0

--parameter cluster0.NUM_CORES=1

--parameter cache_state_modelled=0

--parameter bp.pl011_uart0.untimed_fifos=1

--parameter bp.secureflashloader.fname="\\path\to\bl1.bin"

--parameter bp.flashloader0.fname="\\path\to\\fip.bin"

--data cluster0.cpu0="\\path\to\\Image"@0x80080000

--data cluster0.cpu0="\\path\to\\fvp-base-gicv2-psci.dtb"@0x83000000

--data cluster0.cpu0="\\path\to\\ramdisk.img"@0x84000000

--parameter bp.ve_sysregs.mmbSiteDefault=0

--parameter bp.virtioblockdevice.image_path="\\path\to\\<filesystem>.img"

--parameter bp.smsc_91c111.enabled=true

--parameter bp.hostbridge.userNetworking=true

--parameter bp.hostbridge.userNetPorts="5555=5555,8080=8080,22=22"


Go to the Debugger tab, and select Connect Only. If you have debug symbols available for the image, I recommend loading the symbols via the Execute Debugger Commands panel. Note that the kernel runs at Exception Level EL1, and so the symbols need to be loaded to this level. To do this, use the command:


add-symbol-file "\\path\to\vmlinux" EL1N:0


You should now be able to launch the model (by clicking on Debug). Click the run button, and the model should boot directly into Linux.




Features such as Kernel Awareness (if kernel symbols loaded), Remote System Explorer (RSE) View, and Application Debug will be available, just as per my previous blog. I would also highlight some general improvements we have made to the GUI since that blog was written. Note for using RSE, you need to first set a password for root ("passwd root" on the Linux command line), then create an RSE Linux connection to "localhost", configured for ssh files.



For the impatient


If you ever had to do with CMSIS RTOS API and did not enjoy it, or if you felt it like a straitjacket compared to your native RTOS, well, rest assured, your're not alone. The good news is that your experience matters and you can help improve CMSIS RTOS API. Go to GitHub Issues and comment on any of the existing issues, or open new ones.


Screen Shot 2016-04-17 at 20.18.18.png


The story


ARM, thumbs up for the CMSIS RTOS idea!


First of all I have to confess that I was a big supporter of the general idea of a common CMSIS RTOS API, from the moment I first read about it. However, as big as my expectations were, as big was my dissapointment when the specs went out.


Some CMSIS RTOS API considerations


From my point of view, the main problems with the CMSIS RTOS API are:


  • no POSIX compliance
  • not C++ friendly


Please note that I did not ask for C++ APIs, the plain C APIs should be perfectly fine, I just prefered the APIs to be designed by someone who thinks in C++, not in C (and as such knows how to avoid the usual mess that unstructured C programs bring, especially in the embedded world); unfortunately ARM seems to have no C++ specialists in their design teams.


The CMSIS++ proposal


Given this situation, and seeing that ARM had no plans for a C++ redesign, by the end of 2015 I started to think of CMSIS++, as a C++ POSIX compliant proposal for a future generation of CMSIS. In March 2016 the project was publicly announced in the ARM Connected site.


Some CMSIS RTOS API issues


The initial CMSIS++ attempt was to simply rewrite the original CMSIS RTOS API in C++. However, while starting to walk on this path, I encountered many problems, and noticed many differences from the POSIX and ISO C/C++ specs. At a certain point I realised that the current design is broken beyound repair, and a reset is required, otherwise the approach will not work.


Restarting from scratch, the focus moved from CMSIS to POSIX and ISO.


During the design and development phases, I kept a log of issues that I identified and addressed in the CMSIS++ proposal.


Some were difficulties in understanding the CMSIS RTOS API, due to documentation issues, some are functional issues that make using the original API not very convenient, and some are suggestions for missing features.


The POSIX compliance issues are:


  • Use POSIX error codes (#65)
  • Use explicit separate calls for different waiting functions, like lock(), try_lock(), timed_lock() (#45)
  • Add normal (non-recursive) mutex (#53)
  • Add a mechanism to wait for a thread to terminate (#50)
  • For message queues, make the message size user configurable (#70)
  • For message queues, add message priorities (#72)
  • Make osSemaphoreWait() return errors, not counts (#56)
  • Deprecate or remove the unused thread_id parameter in osMessageQCreate()/osMailQCreate() prototypes (#61)


Other functional issues are:


  • Avoid the heavy use of macros (to define objects and to refer to them) (#36)
  • Do not mandate the use of a dynamic allocator (for stack, queues, etc) (#37)
  • Add support for critical regions (interrupts & scheduler) (#38)
  • Avoid mixing time durations (in milliseconds) with timer counts in ticks (#39)
  • Add a separate RTC system clock (#40)
  • Add os_main() to make the use of a main thread explicit (#41)
  • Add support for a synchronised public memory allocator (#42)
  • Avoid returning agregates (like osEvent) (#43)
  • Extend the range for osKernelSysTick() (#44)
  • Make the scheme to assign names to objects more consistent (#46)
  • Add missing destructor functions to all objects (#47)
  • Extend the range of priority levels (#49)
  • Add a mechanism to enumerate all registered threads (#51)
  • Allow to explicitly define the semaphore max count (#55)
  • Add a method to wait for a memory pool block to become available (#57)
  • Fix non-portable message type in osMessagePut() (#60)
  • Fix osMessagePut()/osMailPut() inconsistent error when called from ISR (#62)
  • Mail queues, as separate objects, are redundant (#63)
  • Add typedefs for all different types used in prototypes (#66)
  • For all objects, add reset functions to return the object to initial status (#67)
  • For mutex, add a method to get the owner thread (#68)
  • For memory pools, add more accessors to get pool status (#69)
  • For message queues, add more accessors to get queue status (#71)


The documentation issues are:


  • Explain that thread functions can return (#48)
  • Explain the mutex behaviour (recursive vs normal) (#52)
  • Clarify the specs for binary vs counting semaphores (#54)
  • Fix the data type used in osMessageQDef() example (#58)
  • Fix misplaced thread id parameter for message queue (#59)




Somehow acknowledging the initial design problems, ARM announced working on CMSIS RTOS API v2. To my pleasant surprise, ARM seems to have deprecated the initial macro based object creation mechanism (probably one of the most annoying features of the RTOS API v1).


In the new proposal ARM also gave up returning aggregate objects, extended the priorities range, added explicit normal/recursive mutex objects, renamed some objects and generally kept very few features from the initial specification, so a design reset seems possible.


However, based on the CMSIS++ experience, there are still more design decisions required to bring the new RTOS v2 closer to POSIX and ISO, for example using the POSIX error codes, using the POSIX explicit separate calls for different waiting functions (like lock(), try_lock(), timed_lock()), etc.


Feedback welcomed


So, if you would like to express your support for POSIX compatibility, or generally to have a better CMSIS RTOS API, please go to GitHub Issues and comment on any of the existing issues (especially those marked with Help Wanted), or open new tickets with your own suggestions.


More info


CMSIS is an ARM technology, now also available as a GitHub project.


CMSIS 5 announcement.


CMSIS++ is an open source project, maintained by Liviu Ionescu.

The main source of information for CMSIS++ is the project web.

April 2016 indeed came with some nice surprises related to project downloads (probably not exactly in the spirit of the previous post related to the project future): the SourceForge statistics revealed more than 3.200.000 files downloaded since the project revival, in the second half of 2013:



The counters include all downloaded files, with the Eclipse updates (which accounts to multiple files for one update) taking the largest share, but also counting the packed archives, and the additional tool packages (Windows Build Tools, OpenOCD, QEMU, etc).


On the other hand, it should be noted that, considering the project migration to GitHub in Sep. 2015, the actual figures might be even higher (but this cannot be quantified, since GitHub does not provide any download statistics).


Otherwise the project is fine, with many amazing features planned for 2016 (mainly related to extending the use of software packages, both in the Packs Manager and in a future project wizard to use the packs content, and the integration with CMSIS++).


Many thank to all those who use, appreciate and support the project!

Considering the recent download trend, after a long and difficult consideration, the decision that it is no longer worth to further maintain the GNU ARM Eclipse project was finally made, and all future development activities will inevitably come to an end. The decision will be effective immediately, as of today, April 1st, 2016.


All Eclipse users are sincerely advised to switch to Keil MDK or IAR Embedded Workbench, definitely the best development environments available in the industry.


The unfortunate Linux and OS X users are advised to seriously consider migrating back to Windows 10, the best operating system ever.


Very Truly Yours,



This blog was written by paulblack to explain some of the new features in DS-5 Development Studio v5.24.


In DS-5 v5.24, we’ve made changes to some of the debugger views. These changes are mostly designed to reduce the amount of non-essential information that the debugger shows, whilst providing easier access to information that is less frequently needed.  This results in a clearer debugger display and a significant boost in debugger performance, caused by reading less information from the target each time the display is updated.


We’ve also added a new script management system, which enhances the existing DS-5 scripting capabilities. It’s now possible to manage scripts in multiple named lists, provide multiple entry points in each script, and to create custom script configuration dialogs with named configuration profiles. This extends the powerful configuration methods that DS-5 already supports for debug and trace configuration and enables significant advances in DS-5 functionality, flexibility and usability. A small selection of example scripts is included in DS-5 v5.24. In future releases we’ll be significantly expanding the range of scripts to give enhanced debugger functionality.


Of course, DS-5 Ultimate Edition continues to provide support for the very latest ARM IP. In this release we have added support for Cortex-R8, Cortex-A32 and Cortex-A35, as well as support for generic ARMv8-M cores. We have also packaged the LDRAlite MISRA conformance tool, extended our Streamline templates and added two new example projects with bare-metal start-up code for Cortex-A72 and Cortex-R8. With the other changes that we have made in this release, DS-5 is now faster, more functional and easier to use than ever before.


This blog focuses on the enhancements we have made in the debugger views (changes to the Register view and the Debug Control view, with the addition of the new Stack view) and introduces our new script management system. For further information, please refer to the DS-5 Changelog.


Changes to DS-5 views


Debug Control View

In DS-5 v5.23, the Debug Control view showed core information for bare-metal connections and thread information for Linux kernel, RTOS support and Linux application debug. In v5.24 we’ve provided a new button at the left of the Debug Control view which lets you toggle between core and thread views whenever both types of information are available:


If you switch to the thread-based view, the current thread will be displayed along with closed lists for the active (scheduled) and non-scheduled threads:


However, all threads (scheduled and non-scheduled) can be displayed by opening the appropriate thread list:


Stack information has been moved out of the Debug Control view into a dedicated Stack view. You can open this view from the ‘Window->Show View’ Eclipse menu or you can right-click on a thread/core and select ‘Show in Stack’ from the pop-up menu:


The Stack view defaults to display only the highest 5 stack levels, but a single click will fetch additional stack frames:


A button at the top of the Stack view lets you configure the default number of stack frames to be displayed:


Registers View

In DS-5 v5.24 we’ve added two new buttons to the top of the Registers view. The first of these buttons opens an intelligent search box (also displayed using the keyboard shortcut Ctrl-F) which helps you to quickly and easily find any register. Double-click the register or register group that you want to display in the Register view:


The second of the new buttons (also available in the Variables and Expressions views) toggles all register values in and out of hexadecimal format:



This button has no effect on registers which are displayed in hexadecimal by default.


We’ve also provided an easy way for you to create custom register lists. You can create and manage custom register lists from the drop-down list box in the Registers view:


You can provide a descriptive name for each custom register list and add the registers that interest you:


When you select a custom register list from the drop-down selection, only the registers that you’ve added to the list will be displayed in the Registers view:


Use-Case Scripts

In DS-5 v5.24 we’ve introduced a new script management system to complement and extend the existing DS-5 scripting capabilities. The essential infrastructure and functionality is in place and we’ve provided a small selection of sample scripts to demonstrate the power, flexibility and ease of use of our new script manager. This is a brief introduction to the key concepts.


Our sample scripts can be found in the Scripts view. We call them “use-case” scripts because each script is focused on a particular “use-case”. Scripts contain configurable blocks of functionality aimed at specific tasks.


Each script can have multiple entry points, effectively multiple public functions. This lets you group related blocks of functionality into a single script. For each of our sample scripts we’ve provided documentation, including details of any configuration items. This documentation is taken from the script itself:


Each entry point can be associated with multiple configuration profiles. This is very similar to the configuration profiles that DS-5 already uses for debug connections. Right-clicking a profile lets you enter the configuration dialog:


The configuration dialog will look very familiar if you’ve already used the DS-5 DTSL configuration dialogs for debug and trace sessions. All of the controls in the dialog are soft-configured from the use-case script, so creating custom configuration panels is easy. The syntax used to create custom control dialogs is simple and easy to understand and is the same syntax that DS-5 already uses for DTSL configuration. It’s possible to create multiple named configuration profile for different use-cases:


Buttons at the top of the Scripts view let you create a new script, run a script entry point using a named configuration profile, edit an existing script, delete a script, refresh the scripts view, import scripts, create a new scripts directory and configure a profile. When you run a use-case script, the script is copied to the “Recent” area of the Scripts view. Output from use-case scripts appears in the Commands view.


DS-5 v5.24 is available to download now, we hope you enjoying using it! For any questions, comments or feedback please post below.

Considering the substantial interest that the initial CMSIS++ announcement stirred up, and the suggestions received, the project license was changed from the copyleft LGPL to the permissive MIT License.


This means you'll be able to use the CMSIS++ files in any commercial or open source projects without any limitations except preserving the included copyright and permission notice.

At embedded world 2016 ARM announced the smallest and lowest power ARMv8-A ARM Cortex-A32 processor, providing ultra-efficient 32-bit compute for the next generation of embedded products.


We also announced the latest high-performance real-time ARM Cortex-R8 processor, based on the ARMv7-R architecture. ARM Cortex-R8 introduces new features to meet the demands of next-generation storage device controllers and mobile communications with a particular focus on the upcoming 5G cellular wireless standard.


The ARM Compiler team has been working alongside the processor team to deliver the best compiler to support the new ARM Cortex-A32 and Cortex-R8. We're pleased to bring to you the ARM Compiler 6.4, which not only supports both new cores, but brings benefits to existing users with further performance improvements, full support for the Cortex-R family and enhance the support for ARMv8-M and ARMv7-M.


Full support for all ARM Cortex families

ARM Compiler 6.0 was introduced for the first time in April 2014, giving birth to the new LLVM-based ARM Compiler 6 series. The first release was limited to the ARMv8-A architecture and was mainly focused on partners working on cutting-edge technology. In July 2015, ARM Compiler 6.02 completed the support for ARM Cortex-A processors adding ARMv7-A. ARM Compiler 6.3, released last November, extended the list of supported devices to the ARM Cortex-M family.


Today, with ARM Compiler 6.4 supporting ARM Cortex-R, we are pleased to announce that the all three Cortex families Cortex-A,  -R and  -M are now fully supported!


The best class optimizer ARM Compiler and the new Cortex-R8 processor are the perfect combination to meet the performance requirements for the next-generation real-time applications.


Enhancing security with TrustZone® for ARMv8-M and XOM

Last November, ARM announced the introduction of the new ARMv8-M architecture for the next generation ARM Cortex-M processor family. ARM recognized the importance of security for ARM Cortex-M devices and made TrustZone available as a feature within the ARMv8-M architecture. TrustZone® creates two separate secure and non-secure worlds with the capability to quickly and efficiently switch between one and the other with a fine-grained control implemented at the hardware level.

ARM Compiler 6.4 fully supports the new TrustZone security extensions and it’s the ideal tool to develop secure software to protect your embedded or Internet of Things device.

Explore how to write secure code with ARM Compiler in the documentation.

Another important security feature implemented in some embedded devices is Execute-only memory (XOM). Execute-only memory allows only instruction fetches and blocks any attempt to read and/or write the protected area. The main benefit is the ability to protect your Intellectual Property by preventing executable code to be read by users. For example it’s possible to place the secure firmware in execute-only memory and load user code separately; the user code won’t be able to read the protected firmware, enhancing the security of your device.

With ARM Compiler 6.4 it is possible to use the option –mexecute-only to generate code without any data access to the code sections: read more about this feature on Infocenter.


ARM Compiler 6.4 is available to download today from the new website. Alternatively, ARM Compiler 6.4 will be integrated in the next release of DS-5 v5.24.


Do have any questions? Feel free to reply to this blog post or send me an email, any feedback is welcome!




TASKING projects for examples in the STM32Cube embedded software libraries


TASKING - Support for C compiler for ARM Cortex-M Series (VX-toolset)


Download the correct archive for the STM32 series and STM32Cube version. Unpack the archive and copy the resulting tree with evaluation boards into the STM32Cube Projects directory.


Some projects have known issues needing small changes, or are depending on third party libraries. A readme in the root of the archive will list these issues and describe possible workarounds. An additional archive is supplied which contains copies of the relevant STM32Cube sources with those workarounds applied. Optionally unpack this archive and copy the resulting tree over the sources in the STM32Cube directory.



CMSIS++ is a portable, vendor-independent hardware abstraction layer intended for C++/C embedded applications, designed with special consideration for the industry standard ARM Cortex-M processor series. Read "CMSIS++" as "the next generation CMSIS", "CMSIS v2.0", or, more accurately, "C++ CMSIS".


Major features and benefits


Written in C++ but with C wrappers for full C support


The original ARM/Keil name stands for Cortex Microcontroller Software Interface Standard, and the CMSIS++ design inherits the good things from ARM CMSIS, but goes one step further and ventures into the world of C++; as such, CMSIS++ is not a C++ wrapper running on top of the ARM CMSIS APIs, but a set of newly designed C++ APIs, with C APIs supported as wrappers on top of the native C++ APIs.


Close adherence to standards (POSIX and ISO)


The first iteration of CMSIS++ was a direct rewrite in C++ of ARM CMSIS, but later most of the definitions were adjusted to match the POSIX IEEE Std 1003.1, 2013 Edition and the ISO/IEC 14882:2011 (E) – Programming Language C++ standards.

As such, CMSIS++ RTOS API is no longer a wrapper over Keil RTX (as ARM CMSIS unfortunately was), but a wrapper over standard threads and synchronisation objects.


Compatibility with existing ARM CMSIS


Although fully written in C++, the current CMSIS++ RTOS API, implemented on top of the FreeRTOS scheduler, and accessible via the C wrapper, was the first non-Keil RTOS that passed the recently released CMSIS RTOS validation suite.




There are many components in the original CMSIS, but the major ones that benefit from C++ are RTOS and Drivers. Since everything revolves around the RTOS API, the C++ RTOS API was the first CMSIS++ API defined and is presented here in more detail.

Under the CMSIS++ RTOS APIs umbrella there are actually several interfaces, two in C++, two in C and one internal, in C++. The relationships between them is presented below:



The native RTOS C++ API


This is the native RTOS interface, implemented in C++, and providing access to the entire RTOS functionality.

The classes are grouped under the os::rtos namespace, and, to access them, C++ applications need to include the <cmsis-plus/rtos/os.h> header.

Objects can be instantiated from native classes in the usual C++ way, and can be allocated statically, dynamically on the caller stack or dynamically on the heap.

Inspired by the POSIX threads usage model, all CMSIS++ native objects can be instantiated in two ways:

  • a simple, minimalistic, default way, with a default constructor, or, if not possible, a constructor with a minimum number of arguments.
  • a fully configurable, maximal way, by using a set of specific attributes, passed as the first argument to a separate constructor.


For example, to create a thread with default settings, only the pointer to the thread function and a pointer to the function arguments need to be specified, while a thread with custom settings can also have a custom priority, a static stack, and possibly other custom settings.


Here is a short example with a thread that counts 5 seconds and quits:


#include <cmsis-plus/rtos/os.h>
#include <cmsis-plus/diag/trace.h>

using namespace os;

// Define the thread function.
// Native threads can have only one pointer parameter.
func(void* args)
  for (int i = 0; i < 5; i++).
      trace::printf("%d sec\n", i);

      // Sleep for one second.
  return nullptr;

// In CMSIS++, os_main() is called from main()
// after initialising and starting the scheduler.
os_main(int argc, char* argv[])
  // Create a new native thread, with pointer to function and no arguments.
  // The thread is automatically destroyed at the end of the os_main() function.
  rtos::Thread th { func, nullptr };

  // Wait for the thread to terminate.

  return 0;


The native CMSIS++ thread is basically a POSIX thread, with some additional functionality (see the os::rtos::Thread reference page for more details).

Similarly, synchronisation objects can be created with the usual C++ approach; for example a piece of code that uses a mutex to protects a counter looks like this:


#include <cmsis-plus/rtos/os.h>

// Protected resource (a counter).
typedef struct {
  int count;
} res_t;

// Alloc the resource statically.
res_t res;

// Define a native mutex to protect the resource.
rtos::Mutex mx;

  // Not much here, real applications are more complicated.


The ISO C++ Threads API


The CMSIS++ ISO C++ Threads API is an accurate implementation of the ISO C++ 11 standard threads specifications.

With the ISO standard threads defined as wrappers over POSIX threads, and with the CMSIS++ native threads functionally compatible with POSIX threads, the implementation of the CMSIS++ ISO threads was quite straightforward.

The classes are grouped under the os::estd namespace, and, to access them, C++ applications have to include headers from the cmsis-plus/iso folder, like <cmsis-plus/iso/thread>. The namespace std:: and the standard header names (like <thread>) could not be used, to avoid clashes with system definitions when building CMSIS++ applications on POSIX host systems. The e in estd stands for embedded, so the namespace is dedicated to embedded standard definitions.


A similar example using the standard C++ threads:


#include <cmsis-plus/iso/thread>
#include <cmsis-plus/iso/chrono>
#include <cmsis-plus/diag/trace.h>

using namespace os;
using namespace os::estd; // Use the embedded version of 'std::'.

// Define the thread function.
// Thanks to the magic of C++ tuples, standard threads
// can have any number of arguments, of any type.
func(int max_count, const char* msg)
  for (int i = 0; i < max_count; i++).
      trace::printf("%d sec, %s\n", i, msg);

      // Sleep for one second. <chrono> is very convenient,
      // notice the duration syntax.
      this_thread::sleep_for (1s);
  return nullptr;

// In CMSIS++, os_main() is called from main()
// after initialising and starting the scheduler.
os_main(int argc, char* argv[])
  // Create a new standard thread, and pass two arguments.
  // The thread is automatically destroyed at the end of the os_main() function.
  thread th { func, 5, "bing" };

  // Wait for the thread to terminate.

  return 0;


Most of the goodies of the C++ 11 standard can be used, for example RAII mutex locks, condition variables, lambdas:


#include <cmsis-plus/iso/mutex>
#include <cmsis-plus/iso/condition_variable>

using namespace os;
using namespace os::estd;

// Protected resource (a counter and a limit).
typedef struct {
  int count;
  int limit;
} res_t;

// Alloc the resource statically.
res_t res { 0, 10 };

// Define a standard mutex to protect the resource.
mutex mx;
// Define a condition variable to notify listeners and detect limits.
condition_variable cv;

// Increment count and notify possible listeners.
  unique_lock<mutex> lck(mx); // Enter the locked region.


  // No need to explicitly unlock, done automatically.

// Return only when count reaches the limit.
  unique_lock<mutex> lck(mx); // Enter the locked region.

          []{ return (res.count >= res.limit); }




Although fully written in C++, CMSIS++ also provides a C API, to be used by C applications. Yes, that's correct, plain C applications can use CMSIS++ without any problems. Only that function names are a bit longer and some of the C++ magic (like running the constructors and the destructors) needs to be done by hand, but otherwise the entire functionality is available.

The C API is defined in the <cmsis-plus/rtos/os-c-api.h> header.


The same simple example that counts 5 seconds and quits, in C would look like:


#include <cmsis-plus/rtos/os-c-api.h>
#include <cmsis-plus/diag/trace.h>

// Define the thread function.
// Native threads can have only one pointer parameter.
func(void* args)
  for (int i = 0; i < 5; i++).
      trace_printf("%d sec\n", i);

      // Sleep for one second.
  return NULL;

// In CMSIS++, os_main() is called from main()
// after initialising and starting the scheduler.
os_main(int argc, char* argv[])
  // Manually allocate space for the thread.
  os_thread_t th;

  // Initialise a new native thread, with function and no arguments.
  os_thread_create(&th, NULL, func, NULL);

  // Wait for the thread to terminate.
  os_thread_join(&th, NULL);

  // Manually destroy the thread.

  return 0;


The ARM CMSIS RTOS C API (compatibility layer)


Even more, the CMSIS++ C wrapper also implements the original ARM CMSIS API. This is a full and accurate implementation, since this API already passed the ARM CMSIS RTOS validation test.

To access this API, include the <cmsis_os.h> header provided in the CMSIS++ package.


#include <cmsis_os.h>
#include <cmsis-plus/diag/trace.h>

// Define the thread function.
// ARM CMSIS threads can have only one pointer parameter.
func(void* args)
  for (int i = 0; i < 5; i++).
      trace_printf("%d sec\n", i);

      // Sleep for one second.
  // ARM CMSIS threads can return, but there is
  // no way to know when this happens.

// The unusual way of defining a thread, specific to CMSIS RTOS API.
// It looks like a function, but it is not, it is a macro that defines
// some internal structures.
osThreadDef(func, 0, 1, 0);

// In CMSIS++, os_main() is called from main()
// after initialising and starting the scheduler.
os_main(int argc, char* argv[])
  // Initialise a new ARM CMSIS thread, with function and no arguments.
  osThreadCreate(osThread(func), NULL);

  // Since ARM CMSIS has no mechanism to wait for a thread to terminate,
  // a more complicated synchronisation scheme must be used.
  // In this test just sleep for a little longer.
  osDelay(6 * osKernelSysTickFrequency);

  return 0;


The CMSIS++ RTOS Reference


The entire CMSIS++ RTOS interface is fully documented in the separate site, available in the project web :


More CMSIS++ components


In addition to the RTOS APIs, CMSIS++ also includes:

  • CMSIS++ Drivers - a C++ rewrite of CMSIS Drivers, with extensions;
  • CMSIS++ POSIX I/O - a layer bringing together access to terminal devices, files and sockets, via a unified and standard API, using open(), close(), read(), write() as main functions;
  • CMSIS++ Startup - a portable startup code, replacing non-portable vendor assembly code;
  • CMSIS++ Core - C++ API for the ARM Cortex-M processors core and peripherals;
  • CMSIS++ Diagnostics - a C++/C API providing support for diagnostics and instrumentation.




CMSIS++ is still a young project, and many things need to be addressed, but the core component, the RTOS API, is pretty well defined and awaiting for comments.

For now it may not be perfect (as it tries to be), but it definitely provides a more standard set of primitives, closer to POSIX, and a wider set of APIs, covering both C++ and C applications; at the same time it does its best to preserve compatibility with the original ARM CMSIS APIs.


Any contributions to improve CMSIS++ will be highly appreciated.


More info


CMSIS++ is an open source project, maintained by Liviu Ionescu.

The main source of information for CMSIS++ is the project web.

The Git repositories and all public releases are available from GitHub.

For questions and discussions, please use the CMSIS++ section of the GNU ARM Eclipse forum.

For bugs and feature requests, please use the GitHub issues.




Christopher Seidl

CMSIS Version 5

Posted by Christopher Seidl Mar 2, 2016

CMSIS Version 5 - Status


CMSIS Version 5 will focus on improvements and further industry adoption. The license will be changed to the permissive Apache 2.0 license, to enable contributions from interested third parties.


Support for the new ARMv8-M architecture will be added as well as improvements for ARM Cortex-A/Cortex-M based hybrid devices (with a clear focus on Cortex-M interaction).


The CMSIS-RTOS API and RTX reference implementation with get several enhancements:

  • Dynamic object creation, flag events, C and C++ API, additional thread and timer functions
  • Secure and Non-Secure support, multi-processor support


CMSIS-Pack will get additions for generic example projects, project templates, and multiple download portals. It will also adopt the Flash loader technology from IAR Systems.


CMSIS Version 5 - Access


As announced on embedded world, the development repository of CMSIS Version 5 is now available on GitHub:


ARM invites all interested parties to contribute and/or to provide feedback for the CMSIS project using GitHub.



Filter Blog

By date:
By tag:

More Like This