I would like to know how what perf counters describe remote memory loads from other cores or nodes. The Arm Cortex A-Series, Programmer's Guide for Arm v8-a (https://cs140e.sergio.bz/docs/ARMv8-A-Programmer-Guide.pdf, would be nice to know the official link to that document too)says (at 11-7):
> For multi-core and multi-cluster systems, before performing a load from external memory, the caches of L2 or L1 caches of cores within the cluster or of other clusters might also be checked
What perf counters describe such loads?
Hi FabianSchuetze,
The Arm Architecture Reference Manual for A-profile architecture chapter D13.12.3.2 Common microarchitectural events lists many events, which might be of interest. For example:
etc.
You will need to verify if your system does indeed implement those events, though.
Thank you so much for the reply and the explanation, vstehle .