Need specific references to the hardware interrupt latency for the ARMv8 Cortex-A53. interrupt latency from when an interrupt is triggered to when the ISR is initially invoked, but not including operating system, kernel, or application latency.
I understand. Allow me to explain the use case. We have a hard real time constraint of no more than 1 microsecond latency for the ARMv8 Cortex-A53 from a customer. Of course, using an FPGA/DSP design the customer can reach this goal, but the end customer wants us to demonstrate this can be accomplished on an LS1043A ARMv8 Cortex-A53 MCU without an FPGA/DSP solution since they are encountering undesirable latency with their FPGA/DSP solution that currently exists. And this latency is caused for a number of reasons, and they prefer to move their application to the MCU and off the FPGA/DSP in general.
Several options have been proposed to measure this latency. One is to service the interrupt using affinity on a dedicate core. The ISR is dedicated to a single core, all other interrupts are disabled preventing scheduling issues, and no context switches or kernel tasks will be allowed on that dedicated core. Essentially no other interrupts or context switches nor kernel tasks will be possible on the dedicated core, so no cache misses because of context switching etc., only an interrupt from the device can be serviced on that core. Another option is to attempt to move the ISR into a L2 cache and service the interrupt from the cache, and I'm not exactly sure how this latter option would be implemented at the moment.
So, the question would be, given the ISR is dedicated to a single core and the only interrupt on that core is servicing the ISR without any context switch, can we get below 1 microsecond latency with an ARMv8 Cortex-A53 architecture?
To what extent will we still have memory system effects and what would the memory system latency look like under this scenario? Realizing it depends on the complexity of the ISR, let’s suppose a basic read operation from the device.
This is what we are trying to ultimately measure both on vanilla Linux and using PREEMPT_RT with full pre-emption, using a dedicated core and/or the ISR in a L2 cache. In fact, measuring from the start of the interrupt to when the interrupt handler is initially invoked. Or, another possibility is measuring from U-boot on a uart to check only hardware interrupt latency with a 1 microsecond hard interrupt latency constraint, is this 1 microsecond possible?
It's a good question and I get where you are coming from, I'm not sure not sure there is a simple answer that we can give from the CPU point of view other than "empirically measure it on your platform". There are many different ARM chipset vendors with very different memory systems and cache sizes, and how the OS is configured will also impact this, so there isn't a neat single answer here.
Based on my past experience with Linux and phone chipsets my gut feel is that 1us seems rather tight for a realtime deadline on an A-profile core; assuming a 1GHz CPU frequency that's only 1000 cycles and you don't need much to go wrong in terms of cache or TLB misses to violate that.
If there are constraints you can play with your CPU partitioning then you may get close to this. If you dedicate a single core to running only the critical part of the interrupt handler so it stays inside the L1 cache and uTLB, and don't run anything else on that core then you can artificially encourage "good" cache hit rates, but that would involve not using that core for anything else and even then it will depend how much shared data you have (memory coherency will mean any shared cache lines may be pulled out of the L1 by other CPUs needing that cache line).
Keeping things hot in the L2 may be possible, but will depends on the size of the L2 vs the total size of code and data that is running on all CPUs. It's a shared cache across all of the cores, so other CPUs doing other work may push the critical stuff out of the L2 cache - so you're just playing with statistical improvement not guaranteed real-time performance.