Arm Development Studio forum I don't understand cache miss count between cachegrind vs. streamline

State Accepted Answer
Locked Locked
Replies 3 replies
Subscribers 119 subscribers
Views 5649 views
Users 0 members are here

2025 survey

We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

I don't understand cache miss count between cachegrind vs. streamline

Seong Jin Cho over 11 years ago

I am studying about cache effect using a simple micro-benchmark.

I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. (Show 1.)

In my board(Arndale-5250), cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. (Show 2.)

However, streamline tool displays different result. It only occur 21,373 cache miss operations. (Show 3.)

I am doubted about hardware prefetch, however I can't check any value through the counter in streamline tool.

I really don't know why streamline tool's cache miss occur very small operations than "cachegrind". Could someone give me a reasonable explanation?

1. Here is a simple micro-benchmark program.

#include <stdio.h>

#define N 10000000

static int A[N];

int main(){

int i;

double temp=0.0;

for (i=0 ; i<N ; i++){

temp = A[i]*A[i];

}

return 0;

}

2. Following result is cachegrind's output:

3. Following result is streamline's output: