We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am studying about cache effect using a simple micro-benchmark.
I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line. (Show 1.)
In my board(Arndale-5250), cache line size=64Byte, so I think totally cache occur N/8 miss operation and cache grind show that. (Show 2.)
However, streamline tool displays different result. It only occur 21,373 cache miss operations. (Show 3.)
I am doubted about hardware prefetch, however I can't check any value through the counter in streamline tool.
I really don't know why streamline tool's cache miss occur very small operations than "cachegrind". Could someone give me a reasonable explanation?
#include <stdio.h>
#define N 10000000
static int A[N];
int main(){
int i;
double temp=0.0;
for (i=0 ; i<N ; i++){
temp = A[i]*A[i];
}
return 0;
Have you checked the disassembly that the compiler generates for your benchmark? There is a good chance that the compiler spots that it is "totally pointless" because you never read the result and hence optimizes out most of the behavior. Try ...
#include <stdio.h> #define N 10000000 static volatile int A[N]; int main(){ int i; int temp = 0; for (i=0 ; i<N ; i++){ temp += A[i]*A[i]; } return temp; }
HTH, Pete
Dear Peter.
I really appreciate for answer.
Here is assembly code for micro-benchmark program and result of "objdump" also same.
.syntax unified .arch armv7-a .eabi_attribute 27, 3 .eabi_attribute 28, 1 .fpu vfpv3-d16 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .thumb .file "test_arm.c" .section .text.startup,"ax",%progbits .align 2 .global main .thumb .thumb_func .type main, %function main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. push {r4, r5} movw r4, #38528 ldr r2, .L5 movt r4, 152 movs r3, #0 ldr r5, .L5+4 .L2: ldr r0, [r2, r3, lsl #2] ldr r1, [r2, r3, lsl #2] adds r3, r3, #1 cmp r3, r4 add r1, r0, r1 str r1, [r5, #448] bne .L2 movs r0, #0 pop {r4, r5} bx lr .L6: .align 2 .L5: .word .LANCHOR0 .word .LANCHOR1 .size main, .-main .comm B,80000000,8 .bss .align 2 .LANCHOR0 = . + 0 .LANCHOR1 = . + 39999552 .type A, %object .size A, 40000000 A: .space 40000000 .type temp, %object .size temp, 4 temp: .space 4 .ident "GCC: (crosstool-NG linaro-1.13.1-4.7-2013.03-20130313 - Linaro GCC 2013.03) 4.7.3 20130226 (prerelease)" .section .note.GNU-stack,"",%progbits