We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi,
I have a question regarding the Cortex-R52+, more specifically around the data caches behaviour.
The setup is the following:
I want to track the number of cache misses generated and to do so I am using the following event counters:
The test I am performing is quite simple, it is a matrix multiply that should cause ~260K data cache misses due to the sparse accesses to the matrix B that should always cause a refill. You may find the code below:
#define N_MATRIX 64 void matrix_multiply(int *A, int *B, int *C) { for (int i = 0; i < N_MATRIX; i++) { for (int j = 0; j < N_MATRIX; j++) { int sum = 0.0f; for (int k = 0; k < N_MATRIX; k++) { sum += A[i * N_MATRIX + k] * B[k * N_MATRIX + j]; } C[i * N_MATRIX + j] = sum; } } }
What I am observing though is that there are a total of ~130K data cache misses, which does not add up in the current cache settings.
Here the output from the test on hardware:
cycles: 3448716 - loads: 524552 - stores: 4219 - dcache refills: 134416 - dcache accesses: 528818
Thus, the question on whether the cache is implementing a different eviction algorithm instead of the LRU, which causes a lot less number of misses than the ones expected.
Do you have any insight on the matter and on which eviction algorithm may be implemented in this case?
Thanks and best regards,
Alessandro