I don't understand cache miss count between cachegrind vs. streamline

I am studying about cache effect using a simple micro-benchmark.

I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. (Show 1.)

In my board(Arndale-5250), cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. (Show 2.)

However, streamline tool displays different result. It only occur 21,373 cache miss operations. (Show 3.)

I am doubted about hardware prefetch, however I can't check any value through the counter in streamline tool.

I really don't know why streamline tool's cache miss occur very small operations than "cachegrind". Could someone give me a reasonable explanation?


1. Here is a simple micro-benchmark program.

  #include <stdio.h>

  #define N 10000000

  static int A[N];

  int main(){

  int i;

  double temp=0.0;

  for (i=0 ; i<N ; i++){

  temp = A[i]*A[i];

  } 

  return 0;




2. Following result is cachegrind's output:

result

3. Following result is streamline's output:

result2
Parents
  • Have you checked the disassembly that the compiler generates for your benchmark? There is a good chance that the compiler spots that it is "totally pointless" because you never read the result and hence optimizes out most of the behavior. Try ...

    #include <stdio.h>
    #define N 10000000
    
    static volatile int A[N];
    
    int main(){
    
      int i;  
      int temp = 0;
    
      for (i=0 ; i<N ; i++){
        temp += A[i]*A[i];
      } 
    
      return temp;
    }
    

     

    HTH,
    Pete

Reply
  • Have you checked the disassembly that the compiler generates for your benchmark? There is a good chance that the compiler spots that it is "totally pointless" because you never read the result and hence optimizes out most of the behavior. Try ...

    #include <stdio.h>
    #define N 10000000
    
    static volatile int A[N];
    
    int main(){
    
      int i;  
      int temp = 0;
    
      for (i=0 ; i<N ; i++){
        temp += A[i]*A[i];
      } 
    
      return temp;
    }
    

     

    HTH,
    Pete

Children
More questions in this forum