A Beginner’s Guide on Interrupt Latency - and Interrupt Latency of the Arm Cortex-M processors

April 1, 2016

15 minute read time.

Introduction

All experienced embedded system designers know that interrupt latency is one of the key characteristics of a microcontrolller, and are aware that this is crucial for many applications with real time requirements. However, the descriptions of interrupt latency in various microcontroller literature often oversimplifies exactly what is included in the ‘interrupt latency’ detail.

This blog will cover the basics of interrupt latency, and what users need to be aware of when selecting a microcontroller with low interrupt latency requirements.

The Definition of Interrupt Latency

The term interrupt latency refers to the number of clock cycles required for a processor to respond to an interrupt request, this is typically a measure based on the number of clock cycles between the assertion of the interrupt request up to the cycle where the first instruction of the interrupt handler expected (figure 1).

Definition of interrupt latency

Figure 1: Definition of interrupt latency

In many cases, when the clock frequency of the system is known, the interrupt latency can also be expressed in terms of time delay, for example, in µsec.

In many processors, the exact interrupt latency depends on what the processor is executing at the time the interrupt occurs. For example, in many processor architectures, the processor starts to respond to a interrupt request only when the current executing instruction completes, which can add a number of extra clock cycles. As a result, the interrupt latency value can contain a best case and a worst case value. This variation can result in jitters of interrupt responses, which could be problematic in certain applications like audio processing (with the introduction of signal distortions) and motor control (which can result in harmonics or vibrations).

Ideally, a processor should have the following characteristics:

The interrupt latency should be low
The interrupt response is deterministic and low jitter
The interrupt handler take as short a time to execute as possible
Can be configured to enter sleep mode on the last instruction of the interrupt service routine if no other interrupt needs service (for interrupt driven applications)

The interrupt latency itself is not the full story. A microcontroller marketing leaflet highlighting an extremely low interrupt latency doesn’t necessarily mean that the microcontroller can satisfy the real-time requirements of a product. A real embedded system might have many interrupt sources and normally each interrupt source has an associated priority level. Many processor architectures support the nesting of interrupts, which means during the execution of a low priority interrupt service routine (ISR), a high priority service can pre-empt and the low priority ISR is suspended, and resume when the high priority ISR completed (figure 2).

Figure 2: Nested Interrupt support

Figure 2: Nested Interrupt support

Many embedded systems require nested interrupt handling, and when a high priority level is running, services to low priority interrupt requests would be delayed. Thus the interrupt latency is normally a lot worse for low priority interrupts, as would be expected.

The nested interrupt handling requirement means that the interrupt controller in the system needs to be flexible in interrupt management, and ideally provide all the essential interrupt prioritization and masking capability. In some cases this could be handled in software, but this can increase the software overhead of the interrupt processing (and code size) and increase the effective latency of serving interrupts. This is discussed in more detail later.

Cortex-M processor family and NVIC

The Nested Vector Interrupt Controller (NVIC) in the Cortex-M processor family is an example of an interrupt controller with extremely flexible interrupt priority management. It enables programmable priority levels, automatic nested interrupt support, along with support for multiple interrupt masking, whilst still being very easy to use by the programmer

For the Cortex-M0 and Cortex-M0+ processors, the NVIC design supports up to 32 interrupt inputs plus a number of built-in system exceptions (figure 3). For each interrupt input, there are four programmable priority levels (figure 4). For the Cortex-M3 and Cortex-M4 processors the NVIC supports up to 240 interrupt inputs, with 8 up to 256 programmable priority levels (also shown in figure 4). Bear in mind that in practice the number of interrupt inputs and the number of priority levels are likely to be driven by the application requirements, and defined by silicon designers based on the needs of the chip design.

Figure 3: The NVIC in the Cortex-M processor family supports multiple interrupt and exception sources

Figure 3: The NVIC in the Cortex-M processor family supports multiple interrupt and exception sources

Figure 4: Priority levels in Cortex-M processors

Figure 4: Priority levels in Cortex-M processors

In addition to the interrupt requests from peripherals, the NVIC design supports internal exceptions, for example, an exception input from a 24-bit timer call SysTick, which is often used by the OS. There are also additional system exceptions to support OS operations, and a Non-Maskable Interrupt (NMI) input. The NMI and HardFault (one of the system exceptions) have fixed priority levels.

Interrupt Latency on the Cortex-M processor family

The interrupt latency of all of the Cortex-M processors is extremely low. The latency count is listed in table 1, and is the exact number of cycles from the assertion of the interrupt request up to the cycle where the first instruction of the interrupt handler is ready to be expected, in a system with zero wait state memory systems:

Processors	Cycles with zero wait state memory
Cortex-M0	16
Cortex-M0+	15
Cortex-M3	12
Cortex-M4	12

Table 1: Interrupt latency of Cortex-M processors with zero wait state memory systems

The interrupt latency listed in table 1 makes a number of simple assumptions:

The memory system has zero wait state (and with resources not being used by other bus masters)
The system level design of the chip does not add delay in the interrupt signal connections between the interrupt sources and the processor
The Interupt service is not blocked by another current running exception/interrupt service
For Cortex-M4, with FPU enabled, the lazy stacking feature is enabled (this is the default)
The current executing instruction is not doing an unaligned transfer/bitband transfer (which can take 1 extra transfer cycle)

To make the Cortex-M devices easy to use and program, and to support the automatic handling of nested exceptions or interrupts, the interrupt response sequence includes a number of stack push operations. This enables all of the interrupt handlers to be written as normal C subroutines, and enables the ISR to start real work immediately without the need to spend time on saving current context.

The stacking operation of the Cortex-M3/M4 processor is shown in figure 5. The diagram shows that register R0 to R3, and R12 are pushed onto the stack within the 12 cycle interrupt latency. If the processing inside the ISR only needs five registers or less, there is no need for additional stacking.

Interrupt entry sequence (stacking) on the Cortex-M3 processor

Figure 5: Interrupt entry sequence (stacking) on the Cortex-M3 processor

The Myth of Interrupt Latency

‘So if I choose a processor with the lowest interrupt latency then that must be good, right?’ Unfortunately it is not as simple as that. The interrupt latency figures often only provide one aspect of the interrupt handling performance, but does not give the complete picture:

Interrupt latency figures do not include any software overhead.

In a number of processor architectures, additional software wrapper code is needed for interrupt handlers to:

handle the stacking of registers, and/or
switch the register bank to a different one, and/or
check which interrupt required servicing (shared interrupt pin), and/or
locate or branch to the starting of interrupt handlers (not vectored),
unstack saved registers at the end of the ISR, etc.

All of these can result in additional, often significant, delays in the processing of interrupts. For example, typically in the 8051 which is still widely used today, there are multiple register banks so it is possible to avoid the need to write software to push registers to stack by switching register banks. You also need a branch/jump instruction to branch to the beginning of the ISR:

8-bit (e.g. 8051)	Cortex-M
1) Interrupt latency 2) SJMP/LJMP to handler 3) PUSH PSW 4) ORL PSW, #00001000b 5) Starting real handler code	1) Interrupt latency 2) Starting real handler code

8-bit (e.g. 8051)

Cortex-M

1) Interrupt latency

2) SJMP/LJMP to handler

3) PUSH PSW

4) ORL PSW, #00001000b

5) Starting real handler code

1) Interrupt latency

2) Starting real handler code

Table 2: Interrupt latency compare between 8051 and Cortex-M processors

As a result, whilst an 8051 microcontroller might have a lower interrupt latency on paper, the overall interrupt latency, when including the software overhead, is much worse than a Cortex-M based microcontrollers.

Interrupt Latency figure does not tell you how long it takes to carry out interrupt handling task

As in any program code, ISRs take time to execute. The faster the performance of the processor, the quicker the interrupt request is serviced, and the longer the system can stay in sleep mode thus reducing power consumption. When considering from the time an interrupt request is asserted to the time the interrupt processing is actually completed, the Cortex-M processors can be much better than other microcontrollers due to these higher performance characteristics (figure 6).

Figure 6: Interrupt latency when considering processing performance

Figure 6: Interrupt latency when considering processing performance

Interrupt Latency figure does not tell you the throughput / capacity of interrupt processing

In relation to the total number of clock cycles of the ISR execution, the maximum throughput / capacity of the system can also be very important in many heavily loaded systems. The maximum request per second depends on the system clock speed as well as the number of clock cycles required for the interrupts to be processed.

Figure 7: Cortex-M based microcontrollers have a much higher interrupt handling capacity

Figure 7: Cortex-M based microcontrollers have a much higher interrupt handling capacity

In traditional 8-bit/16-bit systems, the run time for ISRs can be many more cycles than with Cortex-M based microcontrollers because of lower performance. When combined with the higher maximum clock speed of many Cortex-M based microcontrollers, the maximum interrupt processing capacity can be much higher than other microcontroller products.

Interrupt Latency figure does not tell you about the jitter of interrupt response time

The jitter of interrupt response time refers to the variation (or value range) of interrupt latency cycles. In many systems, the interrupt latency cycle depends on what the CPU is doing when the interrupt takes place. For example, in an architecture like the 8051, if the processor is executing a multicycle instruction, the interrupt entry sequence cannot start until the instruction is finished, which can be a few cycles later. This results in a variation of the number of interrupt latency cycles, and is commonly referred as jitter.

Figure 8: Cortex-M processors are designed to have limited jitter in interrupt response

Figure 8: Cortex-M processors are designed to have limited jitter in interrupt response

In many applications the jitter doesn’t matter. However, in some applications, like audio or motor control, the jitter can results in distortion of audio signals, or vibration/noise of motors due to this unwanted jitter.

In Cortex-M processors, if a multiple cycle instruction is being executed when an interrupt arrives, in most cases, the instruction is abandoned and restarted when the ISR is completed. If the Cortex-M3/Cortex-M4 processor receives an interrupt request during a multiple load/store (memory access) instruction, the current state of the multiple transfer is automatically stored as part of the PSR (Program Status Register) and when the ISR completes, the multiple transfer can resume from where it was stalled by using the saved information in the PSR. This mechanism provides high performance processing while at the same time maintains low jitter in the interrupt response time.

So what should I look for?

Over the years the marketing literature from various microcontroller vendors has incomplete or misleading information on the interrupt latency. For example, sometimes machine cycles are used (instead of clock cycles) for quoting interrupt latencies and in some cases, quotes the interrupt latency but does not including software overhead. It’s important to fully investigate the details to understand the total interrupt latency work and time.

What else could make a difference?

The Cortex-M processors incorporate some additional optimizations during interrupt handling to reduce overheads even further:

Tail chaining

When an ISR is completed, and if there is another ISR waiting to be served, the processor will switch to the other ISR as soon as possible by skipping some of the unstacking and stacking operations which are normally needed (figure 9). This is called Tail Chaining, and can be just six cycles in the Cortex-M3 and Cortex-M4 processors. This also makes the processor much more energy-efficient by avoiding unnecessary memory accesses.

Figure 9: Tail chaining

Figure 9: Tail chaining

Late Arrival

If a high priority interrupt request arrives during the stacking stage of a lower priority interrupt, the high priority interrupt will always be serviced first. This ensures high priority interrupts are serviced quickly, and avoids another level of stacking operation during the nested interrupt handling process. In addition this will save energy on power consumption (due to less access to memory) and less stack space too.

Figure 10: Late arrival

Figure 10: Late arrival

Pop pre-emption

If an interrupt request arrives just as another ISR exiting and the unstacking process is underway, the unstacking sequence is stopped and the ISR for the new interrupt is entered as soon as possible (figure 11). Again, this avoids unnecessary unstacking and stacking, and reduces power consumption and latency.

Figure 11: Pop pre-emption

Figure 11: Pop pre-emption

Do banked registers make a difference?

In some architecture there are multiple register banks, and ISR can use a different, sometimes dedicated, register bank to avoid the overhead of stacking and un-stacking. For example, the 8051 provides four register banks. In the original 8051 the banked registers implementation was memory based, but newer accelerated 8051 designs now use register hardware.

Figure 12: Banked registers

Figure 12: Banked registers

Banked registers can reduce the overhead of context saving and restore in limited circumstances. However, this will often result in larger silicon area, higher power consumption and is not scalable to support the many levels of flexible nested interrupt system requirements. In some cases, like the 8051, there is the need for additional software overhead to switch the register bank(s). The Arm Cortex-M processors do not use banked registers, and this will provide much better energy efficiency and competitive performance when comparing interrupt driven systems with other microcontroller processor architectures.

Extra functionality with Cortex-M processors

Debug Support

The Cortex-M processors support comprehensive debug support features. The Cortex-M3 and Cortex-M4 processors also offer exception trace support which allows the capture and examination of the exception/interrupt history and timing information in a debugger.

Figure 13: Exception trace in Cortex-M3 and Cortex-M4 processors

Figure 13: Exception trace in Cortex-M3 and Cortex-M4 processors

The trace information can be captured using a single pin trace interface called Serial Wire Viewer (SWV), or a multi-bit trace port interface, which has higher trace bandwidth for supporting full instruction trace with an ETM (Embedded Trace Macrocell). The trace information can be very useful for debugging.

Zero jitter support on Cortex-M0/Cortex-M0+ processors

The interrupt latency of Cortex-M processors can be affected by wait states of the on chip bus system, which can result in a small jitter. The Cortex-M0 and Cortex-M0+ processors have an optional feature to force interrupt response time to have zero jitter. This is done by forcing the interrupt latency to be the worst case (i.e. interrupt latency + wait state effect). This feature is typically not used in microcontrollers (just process the interrupt request as quick as possible), but is used in some special SoC designs that demand zero jitter in interrupt responses.

Sleep-on-Exit feature

Sleep-on-Exit is a programmable feature which, when enabled, puts the processor into sleep mode when exiting an ISR if no other interrupt request needs to be serviced. This is very useful for any interrupt driven application, and can save power because it avoids the extra clock cycles in the thread (e.g. “main()” code) state, and reduces the amount of stacking and un-stacking normally needed for interrupt entry and exit. It also has a side effect (and benefit) of a shorter interrupt response time because stacking is not needed. For example, on the Cortex-M0, the wake up from Sleep-on-Exit is only 11 cycles.

Figure 14: Sleep-on-Exit can reduce interrupt latency (first instruction in ISR is SEV)

Figure 14: Sleep-on-Exit can reduce interrupt latency (first instruction in ISR is SEV)

Note that this technique is particularly useful for interrupt driven applications.

Wait-for-Event (WFE) sleep

There are two instructions for entering sleep modes: WFI (Wait for Interrupt) and WFE (Wait for Event). WFE enters sleep mode conditionally, and can wake up by events including:

Interrupts
Hardware event (via an input pin called RXEV)
Debug events

The WFE sleep can be woken up quickly without invoking the interrupt/exception sequence. This can shorten the wake up time to just a few cycles. For example, in the Cortex-M0 processor, it can take just four cycles to wake up from sleep mode:

Figure 15: Wake up from WFE using event input (RXEV)

Figure 15: Wake up from WFE using event input (RXEV)

In this operation the processor resumes from where it was stalled, just after the WFE instruction. Instead of using an RXEV input, a peripheral interrupt with a different feature called SEV-ON-PEND (also a programmable feature) can be used to generate the event and wake up the processor, without the need to execute an ISR.

Once again, note that this technique is most useful for interrupt/event driven applications, and can only be useful when it is known that there is only one interrupt/event source that is being waiting for. If there are other interrupt sources, the program code in thread must still check for the reason for waking up from sleep mode.

Conclusions

The NVIC in the Cortex-M processors provides very flexible interrupt management and many useful features. One key aspect of the NVIC technical advantages is the low interrupt latency. When this is combined with the high performance of the Cortex-M processors, all interrupt requests can be processed quickly and thus provide high interrupt processing throughput. The interrupt latency on the Cortex-M processors is deterministic, and doesn’t have any hidden software overhead, which can be observed in many other architectures.

The Cortex-M processors are designed to be easy to use. For example, the NVIC programmer’s model is very simple, and the interrupt handlers can be programmed as normal C functions. At the same time, it is very powerful. All interrupts have programmable interrupt priority levels and support nested interrupts automatically. Furthermore, the NVIC supports vectored interrupt operations so that there is no need to use software to determine which interrupt to serve, and additional optimizations like tail chaining help reducing interrupt processing overhead and make the processor more energy efficient at the same time.

[CTAToken URL = "https://developer.arm.com/products/processors/cortex-m" target="_blank" text="Read more about Cortex-M processors" class ="green"]

Top Comments

Jens Bauer over 8 years ago

I might be able to provide an answer to one of the questions.
As I've been working with STM32F103 and experienced some jitter I have not been able to get rid of (yet), I'm very interested in how to get rid of the jitter completely.
During my experiments, I found out that ...
1: The STR instruction takes only one clock cycle.
2: The time between two successful toggles on a GPIO pin is two clock cycles.
(I believe that this is due to a latency on either the AHB or APB2)
I measured the "latency" to be 2 clock cycles, which tells me that it would fit the APB2, but somewhere I read about a two-cycle latency on the AHB, unfortunately I can't seem to find this info at the moment.
-That means: If you set/reset a bit on an output-pin and this or another pin was just written, there will be a delay of one clock cycle before the pin actually changes.
I do not know if the write happens on "odd clock cycles" or "even clock cycles" or just requires at least one clock cycle in between toggles, but it seems the latter.
I think that at least one of your lost 3 clock cycles are caused by this problem; perhaps two.
-The last jitter clock cycle might be a Branch-Prediction Cache miss.
I was able to synchronize my interrupt to the timer's output pin fairly well, but not good enough.
What I did was that I read the timer's counter value and then waited the number of clock cycles subtracted from a constant value (the time it would take to reach the one and only branch in my interrupt.
The result of this stunt was fairly good, but it did not get rid of the jitter completely.
-It did help about 2/3, though, which is significant.
Note: I also made the main-loop at task-time spend the exact same number of clock cycles as far as it was possible - though almost every instruction can be interrupted at any time. In addition, I made sure there was no tail-chaining going on (I also tried to force tail-chaining).
I have not found a complete solution to the problem, but I do believe the solution exist.
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Florian Eibensteiner over 8 years ago

Hi Joseph Yiu,
thank you for the detailed explanation about the interrupt latency of Cortex-M processors .
At this point I have one further question: what is the latency of the NVIC itself? As far as I have understood the discussion above, the 12
cycles concern only the latency of the processor core - measured from the time the NVIC forwards the interrupt request to the CPU core to the
execution of the first instruction of the ISR.
I tried to retrace this on a real processor system using a STM32F207 (Cortex-M3) on a MCBSTM32F200 evaluation board. In detail, I programmed a timer in a way to periodically
generates an interrupt and at the same time toggling an output pin. In the ISR of the time a further pin is activated, thus the latency of the interrupt can simply be measured
using an oscilloscope - time between the edge on the timer's output pin and the edge on pin set in the ISR. Well, the program is executed from the internal flash, while the stack is
located in the internal SRAM, therefore I-Bus and S-Bus can be accessed in parallel by the Cortex-M3. For maximum accuracy, inline ASM is used in the ISR for setting the pin. As stated in the figure
below, the first instruction in the ISR is the STR which sets the output pin - the required registers are initialized in the main-function.
The result of the measurement is shown in the figure below, where the yellow signal is the timer's output pin, the violet line is the pin set in the ISR, and the green signal is
the system clock (here 25MHz are used, so the internal memories are working with zero wait states). As you can see, the jitter of the latency is 3 cycles, which is caused
by an bit banding operation done in the main-loop. The cursors are marking the well discussed 12 cycles latency, but now the question is who causes the further 4 cycles?
The STR instruction itself takes one cycle, thus there are still 3 cycles left which cannot be caused by memory wait states. Hence I think this is the time the NVIC needs to
forward the peripheral's interrupt request to the CPU core - but I did not find anything about this in literature so I'm not sure.
Best regard,
Florian Eibensteiner
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joseph Yiu over 8 years ago

Yes, the processor will stack the registers (in the context of ISR1) and then start executing the high priority interrupt.
regards,
Joseph
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Ashar Phatak over 8 years ago

What happens in the following situation:
The processor is executing a low priority interrupt ISR1- after the stacking, the ISR1 is using couple of registers, and a high priority interrupt arrives.
Am I correct that the processor will stack registers from ISR1 and then switch to high priority interrupt?
- Cancel
- Up 0 Down
- Reply
- More
- Cancel
Joseph Yiu over 8 years ago

Sorry, as far as I know we don't have similar article for Cortex-A processors today.
regards,
Joseph
- Cancel
- Up 0 Down
- Reply
- More
- Cancel

Architectures and Processors blog

Using SVE in C#

Alan Hayward

.NET 9 introduces SVE support on Arm, allowing users to write simplified vectorised code. This blog gives examples in C# and compares it to C++.
- November 20, 2024
Part 3: Enabling PAC and BTI on AArch64 for Linux

Bill Roberts

Supporting C++ style exceptions and DWARF for Pointer Authentication Codes (PAC) signed pointers.
- November 20, 2024
Part 2: Enabling PAC and BTI on AArch64 for Linux

Bill Roberts

Utilizing Pointer Authentication Codes (PAC) and Branch Target Instructions (BTI) together and optimizations in instruction counts.
- November 19, 2024

AI and ML blog

Announcements

Architectures and Processors blog

Automotive blog

Embedded blog

Graphics, Gaming, and VR blog

High Performance Computing (HPC) blog

Infrastructure Solutions blog

Internet of Things (IoT) blog

Operating Systems blog

SoC Design and Simulation blog

Tools, Software and IDEs blog