Please note: We are aware of an issue affecting replies on the Arm Community forums, which may not be loading as expected.

We apologize for any inconvenience and appreciate your patience while we investigate and work to resolve the issue.

Thank you for your understanding.


This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A9 L1d cache profiling

Hi, I am trying to profile L1d cache utilisation on a Cortex A9 device (Zynq7020)

To do so, I am using the Performance Monitoring Unit, specifically these two counters (ARMv7 Technical Reference Manual, C12.8):

  • 0x03, Level 1 data cache refill  (CACHEREFILL)
  • 0x04, Level 1 data cache access  (CACHEACCESS)

To get the miss rate from that, I calculate:

100*CACHEREFILL/CACHEACCESS

As also mentioned in the Xilinx document "ug1145, ch: PS profile counters"

Finally, I disable compiler optimisations and iterate through a growing array like so :

#define NMAX 1000000

volatile uint8_t A[NMAX]

for(int n=20; n<NMAX; n++) {
  Xil_DCacheInvalidate();

  startEventMonitoring();
  volatile int result = 0;
  for(int i=0; i<n; i++) {
    result += A[i];
  }
  readEventMonitoring();
}

..But the numbers for cache miss rate do not make sense :)

When for example n=990000, the array is larger than the L1d cache (32kB) but the reported miss rate is 0.3% (?)

In that case CACHEACCESS is 9971321 and CACHEREFILL is 31190

I have checked that the cache is indeed invalidated by the Xilinx BSP, that the PMU counters are indeed 0 in the beginning of each iteration, that nothing is optimised away etc

What am I not understanding correctly?

Thank you,
Alex