Trace Cortex-M software with the Instruction Trace Macrocell (ITM)

Selected Cortex-M processors include the instruction trace microcell (ITM) to help understand system behaviour. Although it can provide other types of trace, the ITM is commonly associated with printf() output and event tracing from applications and operating systems. Historically, Fast Model systems have used semihosting or UART models to provide character and file I/O when running software on models. Starting with version 11.1, Fast Models for Cortex-M provide the option of using the ITM for output and event tracing. This makes software development equivalent for models and boards.

ITM benefits

The primary benefit of the ITM support in Fast Models is the ability to use the same software images for virtual prototypes and FPGA prototypes. In Cortex-M projects, software engineers often move back and forth between models and boards. They value the ability to use the same software for both as this reduces complexity when switching platforms and saves time. The additional maintenance related to source code modifications, based on the type of target, is undesirable for developers.

Using the ITM is also faster than using a UART since writing to the ITM is an internal 32-bit register write in the Cortex-M processor. Writing to a UART takes additional time on the bus and has more impact on system performance. This has little impact on models because Fast Models provide functional software execution and do not model cycle-level timing details. Similarly, semihosting works well on models, but is slower on an FPGA target because the CPU is stopped by the semihosting exception while a data transfer takes place. Semihosting is also toolchain dependent as the procedure for using it with different debuggers and compilers is different.

Using the ITM benefits Cortex-M software developers who don’t use the standard C library. Many projects have memory constraints and prefer to write all code from scratch for full control. In these situations, it would require significant effort to implement semihosting since it works in tandem with the standard C library.

Some Cortex-M systems also have no UART, but can use printf() via the ITM for output.

Keil MDK users have enjoyed easy access to the ITM with the Debug viewer in uVision. Now the same software can be run on Fast Models. Let’s see how it can be done with a Cortex-M4 example.

Using the ITM with a Cortex-M4 Example

Two things are required to use the ITM with Fast Models.

First, a model parameter named TRACE_LVL must be set to true. The default value is true so unless it’s been modified no action is required. The Fast Model reference manual provides all the details related to parameters.

Second, a model trace interface (MTI) plugin is needed to capture ITM packets. Using the ITM in Fast Models requires a plugin to capture the packets, as opposed to using serial wire output (SWO) to send information packets over a wire in hardware.

MTI plugins are created using C++, and examples are provided in the MTI area of the Fast Models examples directory. This article demonstrates a plugin for using the ITM, and provides the source code if any modifications are needed.

To demonstrate the ITM, a small system with a Cortex-M4 and memory is used. The example is shown on Ubuntu 16.04 with GNU gcc 5.4 as the C++ compiler, but everything works the same on all Fast Model supported platform and compiler combinations.

Example shown on Ubuntu 16.04 with GNU gcc 5.4 as the C++ compiler 

The LISA code for the minimal system is below. To create the system, use File -> New Project in  the Fast Model canvas (sgcanvas) and use code below as the .lisa file for project.

// This file was generated by System Generator Canvas
// --------------------------------------------------
component m4
        armcortexm4ct : ARMCortexM4CT("TRACE_LVL"=1);
        ramdevice : RAMDevice("size"=0x80000000);
        pvbusdecoder : PVBusDecoder();
        masterclock : MasterClock();
        clockdivider : ClockDivider()
        masterclock.clk_out => clockdivider.clk_in;
        pvbusdecoder.pvbus_m_range[0..0x5fffffff] => ramdevice.pvbus;
        clockdivider.clk_out => armcortexm4ct.clk_in;
        armcortexm4ct.pvbus_m => pvbusdecoder.pvbus_s;

After using the Build button to compile the example system, run the isim_system executable with the “list trace sources” plugin and redirect the output to a file to see all of the possible trace sources:

$ Linux64-Debug-GCC-5.4/isim_system --plugin $PVLIB_HOME/plugins/Linux64_GCC-5.4/  > trace.sources

Open the trace.sources file and search for ITM. The number of trace sources is a bit overwhelming, but the ITM source is part of the Cortex-M4 model.

Source ITM (Instrumentation Trace Macrocell.)
    Field ITM_PACKET_TYPE type:MTI_ENUM size:1 (ITM and DWT packets type.)
        0x0 = Synchronization packet
        0x1 = Protocol : Overflow packet
        0x2 = Protocol : Local timestamp packets
        0x3 = Protocol : Global timestamp packets
        0x4 = Protocol : Extension packet
        0x5 = Source   : Instrumentation packet
        0x6 = Hardware source : Event counter wrapping (DWT)
        0x7 = Hardware source : Exception tracing
        0x8 = Hardware source : PC sampling
        0x9 = Hardware source : DWT Data trace PC value
        0xa = Hardware source : DWT Data trace address value
        0xb = Hardware source : DWT Data trace DATA value
        0xd = `^H^^ù&^?
    Field PACKET_HEADER type:MTI_UNSIGNED_INT size:1 (ITM Packet Header.)
    Field PACKET_PAYLOAD type:MTI_UNSIGNED_INT size:4 (ITM Packet Payload.)

When using the ITM for printf() functionality, the ITM_PACKET_TYPE is Source. The PACKET_HEADER for a source trace packet has the form below. For complete information about the packet formats refer to the CoreSight technical reference manual.

BBBBB = ITM stimulus port/register number
SS = size (!=00) of payload B = SW Source Address
Size can be get from 2 bits of header(bit 1:0) 
0x1 : 1 byte
0x2 : 2 byte
0x3 : 4 byte

The PACKET_PAYLOAD contains the data written to the ITM port. For messages, this will be the printed character or it can be other values the software would like to transmit.

Attached to this article is a sample MTI plugin for the ITM that was created by modifying the PC trace plugin at $PVLIB_HOME/examples/MTI/SimpleTrace plugin which traces the PC as software executes.

The provided Makefile and Visual Studio project file will compile the plugin in the same way as the example plugins.

The example ITM trace plugin has added a parameter named trace-file which provides the base filename for capturing the ITM output in a text file. The text files are created on a per stimulus port basis so the output from each port goes into a different file. The default is to just send the output to stdout. This is fine for printf() type output using a single stimulus port.

Once the plugin is compiled with the appropriate compiler it's ready to use with the example system. To see any ITM activity with the plugin a software application is needed which uses the ITM.

Example Software

To utilize the ITM, appropriate software must be created. There are multiple ways to do this, but one simple way is to use the standard C library and direct the printf() output to the ITM.

This is done in a comparable way to retargeting output to a UART and is demonstrated here using the Arm compiler. The involved steps are:

  • Disable semihosting using the compiler pragma
  • Implement a custom fputc() function to direct characters from printf() to the ITM
  • Configure and enable the ITM before calling the first printf()

The programmer’s model for the ITM register set can be found in each Cortex-M technical reference manual. The architecture reference manuals for ARMv6-M and ARMv7-M are also useful sources for additional ITM information. For the Cortex-M4, the information is found in technical reference manual section on ITM.

The implementation of fputc() is shown in the code below.

#define ITM_Port8(n)    (*((volatile unsigned char *)(0xe0000000+4*n)))
#define ITM_Port16(n)   (*((volatile unsigned short*)(0xe0000000+4*n)))
#define ITM_Port32(n)   (*((volatile unsigned long *)(0xe0000000+4*n)))

#define ITM_TER   (*((unsigned long *) 0xe0000e00))
#define ITM_TCR   (*((unsigned long *) 0xe0000e80))

#define DEMCR           (*((volatile unsigned long *)(0xe000EDFC)))
#define TRCENA          0x01000000

int32_t ITM_SendChar (int32_t ch) {
    if ((ITM_TCR & (1UL << 0)) &&         /* ITM enabled */
        (DEMCR & TRCENA) &&
        (ITM_TER & (1UL << 0)))
        while (ITM_Port32(0) == 0);
        ITM_Port8(0) = (uint8_t)ch;
    return (ch);

int fputc(int ch, FILE *f)
    (void) ITM_SendChar(ch);


To configure and enable the ITM use the code below and call ITM_init() before the first printf() is done.

// Debug Exception Monitor and Control register
#define DEMCR   (*((unsigned long *) 0xe000edfc))
#define TRCENA  0x01000000  // enable trace

// Stimulus Port registers
#define ITM_STIM0 (*((unsigned long *) 0xe0000000))
#define ITM_STIM1 (*((unsigned long *) 0xe0000004))
#define ITM_STIM2 (*((unsigned long *) 0xe0000008))
#define ITM_STIM3 (*((unsigned long *) 0xe000000c))

// Trace enable registers
#define ITM_TER   (*((unsigned long *) 0xe0000e00))

// Privilege register: registers that can be accessed by unprivileged code
#define ITM_TPR   (*((unsigned long *) 0xe0000e40))

// Trace Control register
#define ITM_TCR   (*((unsigned long *) 0xe0000e80))

// Lock Access register
#define ITM_LAR   (*((unsigned long *) 0xe0000fb0))

// unlock value
#define ITM_LAR_ACCESS  0xc5acce55 

void ITM_init(void)
    ITM_LAR = ITM_LAR_ACCESS; // unlock
    ITM_TCR = 0x1;            // global enable for ITM
    ITM_TPR = 0x1;            // first 8 stim registers have unpriv access
    ITM_TER = 0xf;            // enable 4 stim ports
    DEMCR = TRCENA;           // global enable DWT and ITM

The complete application is included in the file attached to this article.

When the application is run on the Fast Model system with the ITMtrace plugin, the output from the printf() statements appears on stdout. On the surface, it looks exactly the same as semihosting, but is actually using Fast Models ITM support.

$ Linux64-Debug-GCC-5.4/isim_system  -a itm-sw/startup_Cortex-M4.axf  --plugin ITMtrace/  

ITMtrace: plugin instance name: TRACE.ITMtrace
ITMtrace: attached to component: m4.armcortexm4ct
Cortex-M4 bare-metal startup example
Calculating using the software floating point library (no FPU)
Float result should be 80.406250
Float result is        80.406250
Insertion sort took 38 clock ticks
Shell sort took 32 clock ticks
Quick sort took 41 clock ticks


With Fast Models 11.1, Cortex-M models support the Instruction Trace Macrocell (ITM). This enables the same software to run on Fast Models as on other targets such as FPGA boards or final silicon. Embedded software engineers benefit from this new modeling by being able to use the same software images, regardless of the target system. This makes it easier to migrate back and forth between models and boards, resulting in maximum productivity and flexibility for developers.

If you have not used Fast Models for software development, give it a try by requesting an evaluation license using the button below.

Fast Models Downloads