I am studying about cache effect using a simple micro-benchmark.
I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. (Show 1.)
In my board(Arndale-5250), cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. (Show 2.)
However, streamline tool displays different result. It only occur 21,373 cache miss operations. (Show 3.)
I am doubted about hardware prefetch, however I can't check any value through the counter in streamline tool.
I really don't know why streamline tool's cache miss occur very small operations than "cachegrind". Could someone give me a reasonable explanation?
#include <stdio.h>
#define N 10000000
static int A[N];
int main(){
int i;
double temp=0.0;
for (i=0 ; i<N ; i++){
temp = A[i]*A[i];
}
return 0;
Dear Peter.
I really appreciate for answer.
Here is assembly code for micro-benchmark program and result of "objdump" also same.
.syntax unified .arch armv7-a .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfpv3-d16 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .thumb .file "test_arm.c" .section .text.startup,"ax",%progbits .align 2 .global main .thumb .thumb_func .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. push {r4, r5} movw r4, #38528 ldr r2, .L5 movt r4, 152 movs r3, #0 ldr r5, .L5+4 .L2: ldr r0, [r2, r3, lsl #2] ldr r1, [r2, r3, lsl #2] adds r3, r3, #1 cmp r3, r4 add r1, r0, r1 str r1, [r5, #448] bne .L2 movs r0, #0 pop {r4, r5} bx lr .L6: .align 2 .L5: .word .LANCHOR0 .word .LANCHOR1 .size main, .-main .comm B,80000000,8 .bss .align 2 .LANCHOR0 = . + 0 .LANCHOR1 = . + 39999552 .type A, %object .size A, 40000000 A: .space 40000000 .type temp, %object .size temp, 4 temp: .space 4 .ident "GCC: (crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03) 4.7.3 20130226 (prerelease)" .section .note.GNU-stack,"",%progbits