Hi, I am trying to profile L1d cache utilisation on a Cortex A9 device (Zynq7020)
To do so, I am using the Performance Monitoring Unit, specifically these two counters (ARMv7 Technical Reference Manual, C12.8):
To get the miss rate from that, I calculate:
100*CACHEREFILL/CACHEACCESS
As also mentioned in the Xilinx document "ug1145, ch: PS profile counters"
Finally, I disable compiler optimisations and iterate through a growing array like so :
#define NMAX 1000000volatile uint8_t A[NMAX]for(int n=20; n<NMAX; n++) { Xil_DCacheInvalidate(); startEventMonitoring(); volatile int result = 0; for(int i=0; i<n; i++) { result += A[i]; } readEventMonitoring();}
..But the numbers for cache miss rate do not make sense :)
When for example n=990000, the array is larger than the L1d cache (32kB) but the reported miss rate is 0.3% (?)
In that case CACHEACCESS is 9971321 and CACHEREFILL is 31190
I have checked that the cache is indeed invalidated by the Xilinx BSP, that the PMU counters are indeed 0 in the beginning of each iteration, that nothing is optimised away etc
What am I not understanding correctly?
Thank you,Alex
think, you have to switch to monitor mode. At least it is needed if you use HW watch/break points. student portal fusd
Of course, you need to skip the BKTP instruction before returning.