Most engineers developing embedded software on ARM-based microcontrollers (MCU) are familiar with the use of ARM® CoreSight™ instruction trace as a profiling and performance analysis tool. Instruction trace is great because it is totally non-intrusive and provides perfect information down to instruction level (the clue is in the name). However, only high-end microcontrollers implement the required CoreSight Embedded Trace Macrocell (ETM) to be able to generate instruction trace.

The reality is, not all market applications have instruction trace as a hard requirement, so, at the end the decision to implement an ETM on an MCU device comes down to target price, area and power. But that doesn’t mean you have to develop your firmware in the dark if you haven’t got an ETM available. The Data Watchpoint & Trace unit (DWT) and Instrumentation Trace Macrocell (ITM) available on nearly all Cortex®-M3 and Cortex-M4 processors are enough to give you the visibility you need into your embedded system in order to enable fast diagnosis.

 

Turning data into actionable information

 

Typical requirements of real-time systems are related to their predictability and responsiveness (let’s leave code size and energy efficiency out of the equation for now). These requirements are commonly linked to latency, in particular interrupt handling time, and throughput. So being able to measure these metrics is therefore a must-have to many of us. However, this data alone is of little use in helping you find where things have gone wrong. To be able to home in on the root cause of a problem in your software, you want to know which parts of the code dominate CPU cycles, how RTOS tasks are sequenced and so on. That’s where DWT and ITM come handy.

 

The DWT unit is capable of non-intrusively sampling the program counter, which can be used to generate software profile reports when no instruction trace is available. Additionally, it can report on exception handling time for different interrupts, watch variables and track CPU performance counters entirely overhead-free. On the other hand, the ITM can transport user generated messages with minimal probe effect. Such messages, much like ‘printf’ text outputs, can be used to uniquely externalize states of the software. For instance, the Keil® RTX RTOS uses ITM to trace task switches.

 

Cortex-M3-CoreSight.png

Common Cortex-M3 CoreSight debug and trace system

 

In isolation all these features are useful, but it is only when you integrate them into a single framework that you can fully exploit the information generated by the target to quickly and efficiently identify areas for improvement in the software.

 

DS-5 Streamline meets MCUs

 

The DS-5 Streamline analyzer has become an extremely popular performance analysis tool for Linux and Android systems by integrating different metrics gathered from the target and enabling an understanding of how the system works as a whole. Starting with DS-5 version 5.16, Streamline is also to be able to generate and visualize reports for microcontrollers running RTOS. This new functionality offers very low overhead data collection and enables Cortex-M3 and Cortex-M4 developers to:

  • Easily benchmark applications and measure time between events
  • Understand the relationship between RTOS tasks, and measure processor time consumed by each of them
  • Spot spikes in performance counters such as CPU activity, CPI and exception handling time, and relate them with what software runs on the target at the time.
  • Select a period of time to generate profiling reports that apply to only the selected period, removing unwanted noise from the analysis

 

5-16-Streamline-beta.png

Streamline for MCU 5.16 Beta – Timeline view

 

InterruptHandling.png

Streamline for MCU 5.17 Beta – Interrupt handling statistics

 

 

How can I trace task activity on my RTOS

 

To be able to visualize task activity on Streamline’s Timeline view, your system must send ITM messages tracking theses events. The Keil RTX RTOS already has this capability built-in, and can be used as a reference implementation for other RTOSs. Below is a simple recipe for formatting ITM messages for Streamline. OS-related messages are expected to be transmitted on ITM port 31.

 

Task Name message format

Line

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

0

Task Start Function

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Create

Task ID

 

Scheduler Switch message format

Line

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Task ID

 

Field

Description

Task Start Function

Pointer to the address of the start function for this task, to be used as the task name

Create

Non-zero if the task is newly created or is being reported again.

Task ID

Number associated with this task. Must not be 0. 255 indicates idle time.

 

Availability

 

Streamline for MCU is initially available as a Beta feature in DS-5 version 5.16. To use it you will need to configure a debug connection to your Cortex-M3 or Cortex-M4 device with ETB or TPIU trace collection. In future releases expect to see many enhancements, such as exception handling statistics, out-of-the-support for more RTOSs, support for single-pin trace interface and more. Stay tuned.