We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi, I am trying to profile L1d cache utilisation on a Cortex A9 device (Zynq7020)
To do so, I am using the Performance Monitoring Unit, specifically these two counters (ARMv7 Technical Reference Manual, C12.8):
To get the miss rate from that, I calculate:
100*CACHEREFILL/CACHEACCESS
As also mentioned in the Xilinx document "ug1145, ch: PS profile counters"
Finally, I disable compiler optimisations and iterate through a growing array like so :
#define NMAX 1000000volatile uint8_t A[NMAX]for(int n=20; n<NMAX; n++) { Xil_DCacheInvalidate(); startEventMonitoring(); volatile int result = 0; for(int i=0; i<n; i++) { result += A[i]; } readEventMonitoring();}
..But the numbers for cache miss rate do not make sense :)
When for example n=990000, the array is larger than the L1d cache (32kB) but the reported miss rate is 0.3% (?)
In that case CACHEACCESS is 9971321 and CACHEREFILL is 31190
I have checked that the cache is indeed invalidated by the Xilinx BSP, that the PMU counters are indeed 0 in the beginning of each iteration, that nothing is optimised away etc
What am I not understanding correctly?
Thank you,Alex