All experienced embedded system designers know that interrupt latency is one of the key characteristics of a microcontrolller, and are aware that this is crucial for many applications with real time requirements. However, the descriptions of interrupt latency in various microcontroller literature often oversimplifies exactly what is included in the ‘interrupt latency’ detail.
This blog will cover the basics of interrupt latency, and what users need to be aware of when selecting a microcontroller with low interrupt latency requirements.
The term interrupt latency refers to the number of clock cycles required for a processor to respond to an interrupt request, this is typically a measure based on the number of clock cycles between the assertion of the interrupt request up to the cycle where the first instruction of the interrupt handler expected (figure 1).
Figure 1: Definition of interrupt latency
In many cases, when the clock frequency of the system is known, the interrupt latency can also be expressed in terms of time delay, for example, in µsec.
In many processors, the exact interrupt latency depends on what the processor is executing at the time the interrupt occurs. For example, in many processor architectures, the processor starts to respond to a interrupt request only when the current executing instruction completes, which can add a number of extra clock cycles. As a result, the interrupt latency value can contain a best case and a worst case value. This variation can result in jitters of interrupt responses, which could be problematic in certain applications like audio processing (with the introduction of signal distortions) and motor control (which can result in harmonics or vibrations).
Ideally, a processor should have the following characteristics:
The interrupt latency itself is not the full story. A microcontroller marketing leaflet highlighting an extremely low interrupt latency doesn’t necessarily mean that the microcontroller can satisfy the real-time requirements of a product. A real embedded system might have many interrupt sources and normally each interrupt source has an associated priority level. Many processor architectures support the nesting of interrupts, which means during the execution of a low priority interrupt service routine (ISR), a high priority service can pre-empt and the low priority ISR is suspended, and resume when the high priority ISR completed (figure 2).
Figure 2: Nested Interrupt support
Many embedded systems require nested interrupt handling, and when a high priority level is running, services to low priority interrupt requests would be delayed. Thus the interrupt latency is normally a lot worse for low priority interrupts, as would be expected.
The nested interrupt handling requirement means that the interrupt controller in the system needs to be flexible in interrupt management, and ideally provide all the essential interrupt prioritization and masking capability. In some cases this could be handled in software, but this can increase the software overhead of the interrupt processing (and code size) and increase the effective latency of serving interrupts. This is discussed in more detail later.
The Nested Vector Interrupt Controller (NVIC) in the Cortex-M processor family is an example of an interrupt controller with extremely flexible interrupt priority management. It enables programmable priority levels, automatic nested interrupt support, along with support for multiple interrupt masking, whilst still being very easy to use by the programmer
For the Cortex-M0 and Cortex-M0+ processors, the NVIC design supports up to 32 interrupt inputs plus a number of built-in system exceptions (figure 3). For each interrupt input, there are four programmable priority levels (figure 4). For the Cortex-M3 and Cortex-M4 processors the NVIC supports up to 240 interrupt inputs, with 8 up to 256 programmable priority levels (also shown in figure 4). Bear in mind that in practice the number of interrupt inputs and the number of priority levels are likely to be driven by the application requirements, and defined by silicon designers based on the needs of the chip design.
Figure 3: The NVIC in the Cortex-M processor family supports multiple interrupt and exception sources
Figure 4: Priority levels in Cortex-M processors
In addition to the interrupt requests from peripherals, the NVIC design supports internal exceptions, for example, an exception input from a 24-bit timer call SysTick, which is often used by the OS. There are also additional system exceptions to support OS operations, and a Non-Maskable Interrupt (NMI) input. The NMI and HardFault (one of the system exceptions) have fixed priority levels.
The interrupt latency of all of the Cortex-M processors is extremely low. The latency count is listed in table 1, and is the exact number of cycles from the assertion of the interrupt request up to the cycle where the first instruction of the interrupt handler is ready to be expected, in a system with zero wait state memory systems:
Table 1: Interrupt latency of Cortex-M processors with zero wait state memory systems
The interrupt latency listed in table 1 makes a number of simple assumptions:
To make the Cortex-M devices easy to use and program, and to support the automatic handling of nested exceptions or interrupts, the interrupt response sequence includes a number of stack push operations. This enables all of the interrupt handlers to be written as normal C subroutines, and enables the ISR to start real work immediately without the need to spend time on saving current context.
The stacking operation of the Cortex-M3/M4 processor is shown in figure 5. The diagram shows that register R0 to R3, and R12 are pushed onto the stack within the 12 cycle interrupt latency. If the processing inside the ISR only needs five registers or less, there is no need for additional stacking.
Figure 5: Interrupt entry sequence (stacking) on the Cortex-M3 processor
‘So if I choose a processor with the lowest interrupt latency then that must be good, right?’ Unfortunately it is not as simple as that. The interrupt latency figures often only provide one aspect of the interrupt handling performance, but does not give the complete picture:
In a number of processor architectures, additional software wrapper code is needed for interrupt handlers to:
All of these can result in additional, often significant, delays in the processing of interrupts. For example, typically in the 8051 which is still widely used today, there are multiple register banks so it is possible to avoid the need to write software to push registers to stack by switching register banks. You also need a branch/jump instruction to branch to the beginning of the ISR:
1) Interrupt latency
2) SJMP/LJMP to handler
3) PUSH PSW
4) ORL PSW, #00001000b
5) Starting real handler code
2) Starting real handler code
Table 2: Interrupt latency compare between 8051 and Cortex-M processors
As a result, whilst an 8051 microcontroller might have a lower interrupt latency on paper, the overall interrupt latency, when including the software overhead, is much worse than a Cortex-M based microcontrollers.
As in any program code, ISRs take time to execute. The faster the performance of the processor, the quicker the interrupt request is serviced, and the longer the system can stay in sleep mode thus reducing power consumption. When considering from the time an interrupt request is asserted to the time the interrupt processing is actually completed, the Cortex-M processors can be much better than other microcontrollers due to these higher performance characteristics (figure 6).
Figure 6: Interrupt latency when considering processing performance
In relation to the total number of clock cycles of the ISR execution, the maximum throughput / capacity of the system can also be very important in many heavily loaded systems. The maximum request per second depends on the system clock speed as well as the number of clock cycles required for the interrupts to be processed.
Figure 7: Cortex-M based microcontrollers have a much higher interrupt handling capacity
In traditional 8-bit/16-bit systems, the run time for ISRs can be many more cycles than with Cortex-M based microcontrollers because of lower performance. When combined with the higher maximum clock speed of many Cortex-M based microcontrollers, the maximum interrupt processing capacity can be much higher than other microcontroller products.
The jitter of interrupt response time refers to the variation (or value range) of interrupt latency cycles. In many systems, the interrupt latency cycle depends on what the CPU is doing when the interrupt takes place. For example, in an architecture like the 8051, if the processor is executing a multicycle instruction, the interrupt entry sequence cannot start until the instruction is finished, which can be a few cycles later. This results in a variation of the number of interrupt latency cycles, and is commonly referred as jitter.
Figure 8: Cortex-M processors are designed to have limited jitter in interrupt response
In many applications the jitter doesn’t matter. However, in some applications, like audio or motor control, the jitter can results in distortion of audio signals, or vibration/noise of motors due to this unwanted jitter.
In Cortex-M processors, if a multiple cycle instruction is being executed when an interrupt arrives, in most cases, the instruction is abandoned and restarted when the ISR is completed. If the Cortex-M3/Cortex-M4 processor receives an interrupt request during a multiple load/store (memory access) instruction, the current state of the multiple transfer is automatically stored as part of the PSR (Program Status Register) and when the ISR completes, the multiple transfer can resume from where it was stalled by using the saved information in the PSR. This mechanism provides high performance processing while at the same time maintains low jitter in the interrupt response time.
Over the years the marketing literature from various microcontroller vendors has incomplete or misleading information on the interrupt latency. For example, sometimes machine cycles are used (instead of clock cycles) for quoting interrupt latencies and in some cases, quotes the interrupt latency but does not including software overhead. It’s important to fully investigate the details to understand the total interrupt latency work and time.
The Cortex-M processors incorporate some additional optimizations during interrupt handling to reduce overheads even further:
When an ISR is completed, and if there is another ISR waiting to be served, the processor will switch to the other ISR as soon as possible by skipping some of the unstacking and stacking operations which are normally needed (figure 9). This is called Tail Chaining, and can be just six cycles in the Cortex-M3 and Cortex-M4 processors. This also makes the processor much more energy-efficient by avoiding unnecessary memory accesses.
Figure 9: Tail chaining
If a high priority interrupt request arrives during the stacking stage of a lower priority interrupt, the high priority interrupt will always be serviced first. This ensures high priority interrupts are serviced quickly, and avoids another level of stacking operation during the nested interrupt handling process. In addition this will save energy on power consumption (due to less access to memory) and less stack space too.
Figure 10: Late arrival
If an interrupt request arrives just as another ISR exiting and the unstacking process is underway, the unstacking sequence is stopped and the ISR for the new interrupt is entered as soon as possible (figure 11). Again, this avoids unnecessary unstacking and stacking, and reduces power consumption and latency.
Figure 11: Pop pre-emption
In some architecture there are multiple register banks, and ISR can use a different, sometimes dedicated, register bank to avoid the overhead of stacking and un-stacking. For example, the 8051 provides four register banks. In the original 8051 the banked registers implementation was memory based, but newer accelerated 8051 designs now use register hardware.
Figure 12: Banked registers
Banked registers can reduce the overhead of context saving and restore in limited circumstances. However, this will often result in larger silicon area, higher power consumption and is not scalable to support the many levels of flexible nested interrupt system requirements. In some cases, like the 8051, there is the need for additional software overhead to switch the register bank(s). The Arm Cortex-M processors do not use banked registers, and this will provide much better energy efficiency and competitive performance when comparing interrupt driven systems with other microcontroller processor architectures.
The Cortex-M processors support comprehensive debug support features. The Cortex-M3 and Cortex-M4 processors also offer exception trace support which allows the capture and examination of the exception/interrupt history and timing information in a debugger.
Figure 13: Exception trace in Cortex-M3 and Cortex-M4 processors
The trace information can be captured using a single pin trace interface called Serial Wire Viewer (SWV), or a multi-bit trace port interface, which has higher trace bandwidth for supporting full instruction trace with an ETM (Embedded Trace Macrocell). The trace information can be very useful for debugging.
The interrupt latency of Cortex-M processors can be affected by wait states of the on chip bus system, which can result in a small jitter. The Cortex-M0 and Cortex-M0+ processors have an optional feature to force interrupt response time to have zero jitter. This is done by forcing the interrupt latency to be the worst case (i.e. interrupt latency + wait state effect). This feature is typically not used in microcontrollers (just process the interrupt request as quick as possible), but is used in some special SoC designs that demand zero jitter in interrupt responses.
Sleep-on-Exit is a programmable feature which, when enabled, puts the processor into sleep mode when exiting an ISR if no other interrupt request needs to be serviced. This is very useful for any interrupt driven application, and can save power because it avoids the extra clock cycles in the thread (e.g. “main()” code) state, and reduces the amount of stacking and un-stacking normally needed for interrupt entry and exit. It also has a side effect (and benefit) of a shorter interrupt response time because stacking is not needed. For example, on the Cortex-M0, the wake up from Sleep-on-Exit is only 11 cycles.
Figure 14: Sleep-on-Exit can reduce interrupt latency (first instruction in ISR is SEV)
Note that this technique is particularly useful for interrupt driven applications.
There are two instructions for entering sleep modes: WFI (Wait for Interrupt) and WFE (Wait for Event). WFE enters sleep mode conditionally, and can wake up by events including:
The WFE sleep can be woken up quickly without invoking the interrupt/exception sequence. This can shorten the wake up time to just a few cycles. For example, in the Cortex-M0 processor, it can take just four cycles to wake up from sleep mode:
Figure 15: Wake up from WFE using event input (RXEV)
In this operation the processor resumes from where it was stalled, just after the WFE instruction. Instead of using an RXEV input, a peripheral interrupt with a different feature called SEV-ON-PEND (also a programmable feature) can be used to generate the event and wake up the processor, without the need to execute an ISR.
Once again, note that this technique is most useful for interrupt/event driven applications, and can only be useful when it is known that there is only one interrupt/event source that is being waiting for. If there are other interrupt sources, the program code in thread must still check for the reason for waking up from sleep mode.
The NVIC in the Cortex-M processors provides very flexible interrupt management and many useful features. One key aspect of the NVIC technical advantages is the low interrupt latency. When this is combined with the high performance of the Cortex-M processors, all interrupt requests can be processed quickly and thus provide high interrupt processing throughput. The interrupt latency on the Cortex-M processors is deterministic, and doesn’t have any hidden software overhead, which can be observed in many other architectures.
The Cortex-M processors are designed to be easy to use. For example, the NVIC programmer’s model is very simple, and the interrupt handlers can be programmed as normal C functions. At the same time, it is very powerful. All interrupts have programmable interrupt priority levels and support nested interrupts automatically. Furthermore, the NVIC supports vectored interrupt operations so that there is no need to use software to determine which interrupt to serve, and additional optimizations like tail chaining help reducing interrupt processing overhead and make the processor more energy efficient at the same time.
Read more about Cortex-M processors
sorry for my late response and thank you for your detailed answer, this was the missing piece I'm looking for.
So if the 12 cycles latency also including the NVIC, the 15 cycles I have measured can be explained by the following:
- 1 cycles is needed to handover the interrupt request from the timer hardware to the NVIC
- 12 cycles are required for saving the registers and loading the ISR, as depicted in your explanation above
- 1 cycle is needed to execute the STR-Instruction by the execution-stage itself
- and finally the bus access takes also one cycle (this is the extra cycle hidden by the LSU, but visible at the I/O pin).
As a reminder my system setup for the latency measurement is as follows
- program code is loaded from internal flash (zero-wait-state)
- data is stored in the internal SRAM (zero-wait-state), thus fetching instruction and saving registers can be done in parallel
- GPIO controller is connected to the fast AHB, thus accessing data output register should take only 1 cycle in this case
- expected from the timer's ISR, only a "while(1);" is executed.
- no Debugger is connected to the system.
I forgot to mention, the write to the GPIO register also have latency.
While the STR takes only one cycle in the processor, the actual transfer can be multiple cycles (especially if the GPIO is on APB, which needs two cycles per transfer).
There is a write buffer in Cortex-M3/M4 bus interface to hide that delay, but the delay is visible on the I/O pin status change.
The NVIC itself doesn't have the extra latency, as we measured the interrupt latency (12 cycles) including the NVIC. If the main program is only a while(1) and the memory is zero wait-state, the interrupt latency must be only 12 cycles, except if the system bus is handling an access from the debugger (this can caused the jitter you saw). There is also nothing that can cause jitters on the interrupt detection logic inside the NVIC.
For what you see, the extra delay could be caused by :
- a input sampling logic inside GPIO, and
- a clock domain synchroniser between the GPIO and the NVIC
Since the microcontroller might support different clock domains between GPIO and the processor, there could be a need for a cross clock domain synchroniser on the GPIO interrupt line from GPIO to NVIC. This synchroniser can contain a double D flip-flop path and hence at least 2 cycle delay.
Hope this helps.
you are right, different bus frequencies, peripheral or special register access can cause latency cycles. Thus in my
experiments I tried to minimize these effects by using the same clock frequency for the CPU, the enabled peripherals and all buses in the
system - so I have only one clock domain. I further changed the code in the main-loop to "while(1);", so the CPU actually executes only a
brunch instruction and no access to memory or any other peripheral when the interrupt arrives. The result can be seen in the following picture - as expected there is no jitter:
I think the 3 additional cycles are caused by the NVIC, because the timer's interrupt is generated internally so no
GPIO sampling logic should be required. For me it seems to be plausible that the NVIC needs 2 cycles to forward
the interrupt request of the timer but I'm not sure because I have no information about the architecture of the NVIC.
As Jens mentioned, one possibility is that the STR instruction or the the impact of the STR instruction, respectively, takes
two cycles. Therefore I tried to determine the timing of a STR instruction by using the following source code:
Here, some registers are initialized so that an output-pin can be toggled by a single STR instruction. In doing so,
one pulse on the output-pin should reflect the timing of a STR which can be taken into account for the ISR's latency measurement.
The experiment result in the following, where the green signal is the system clock and the violet signal is measured at the output-pin:
What we see is that the STR takes exactly one cycle and the pin is toggled without any latency caused by buses or bridges.
However, now the question is whether the LSU more precisely the buffer in the LSU decouples the time of the STRs like a pipeline
and thus, the assumption that a STR toggles a pin in one cycle is not valid.
Yes, as Jens mentioned, peripheral bus bridges can have latency and if the flash memory and SRAM are zero wait state, you can still be affected by the latency cycle of bus bridge if the interrupt arrived at the same time you are reading a data in a peripheral bus.
Similar wait state could also happen when you are accessing a Private Peripheral Bus components or NVIC/SCS register inside the processor.
Furthermore, the interrupt generation logic inside the GPIO can have latency. In some cases, they have sampling logic inside and the interrupt is triggered after the sampling logic.
(I don't know if this is the case for STM32).
Another interesting cause of jitters is that some microcontroller has peripheral bus running at a slower clock frequency (e.g. 1/2 of CPU speed). In this case, access to the peripheral can have jitter caused by the clock frequency bridging.
This affects not just interrupt jitter, but also jitters in normal accesses.