I am studying about cache effect using a simple micro-benchmark.
I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. (Show 1.)
In my board(Arndale-5250), cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. (Show 2.)
However, streamline tool displays different result. It only occur 21,373 cache miss operations. (Show 3.)
I am doubted about hardware prefetch, however I can't check any value through the counter in streamline tool.
I really don't know why streamline tool's cache miss occur very small operations than "cachegrind". Could someone give me a reasonable explanation?
#include <stdio.h>
#define N 10000000
static int A[N];
int main(){
int i;
double temp=0.0;
for (i=0 ; i<N ; i++){
temp = A[i]*A[i];
}
return 0;