Cortex A9 L1d cache profiling

Hi, I am trying to profile L1d cache utilisation on a Cortex A9 device (Zynq7020)

To do so, I am using the Performance Monitoring Unit, specifically these two counters (ARMv7 Technical Reference Manual, C12.8):

  • 0x03, Level 1 data cache refill  (CACHEREFILL)
  • 0x04, Level 1 data cache access  (CACHEACCESS)

To get the miss rate from that, I calculate:

100*CACHEREFILL/CACHEACCESS

As also mentioned in the Xilinx document "ug1145, ch: PS profile counters"

Finally, I disable compiler optimisations and iterate through a growing array like so :

#define NMAX 1000000

volatile uint8_t A[NMAX]

for(int n=20; n<NMAX; n++) {
  Xil_DCacheInvalidate();

  startEventMonitoring();
  volatile int result = 0;
  for(int i=0; i<n; i++) {
    result += A[i];
  }
  readEventMonitoring();
}

..But the numbers for cache miss rate do not make sense :)

When for example n=990000, the array is larger than the L1d cache (32kB) but the reported miss rate is 0.3% (?)

In that case CACHEACCESS is 9971321 and CACHEREFILL is 31190

I have checked that the cache is indeed invalidated by the Xilinx BSP, that the PMU counters are indeed 0 in the beginning of each iteration, that nothing is optimised away etc

What am I not understanding correctly?

Thank you,
Alex