Something weird when I count both Instruction Cache Miss event and event 0x1 (viz. “Stall because instruction buffer cannot deliver an instruction. This could indicate an Instruction Cache miss or an Instruction MicroTLB miss. This event occurs every cycle in which the condition is present”). If I disable the branch prediction, the stall count goes up dramatically (from 171 to 434 stalls) but the cache misses do not change so much (only from 12 to 13 misses).
ps: I count the Micro TLB misses too, and it always shows to be 0, so it should not be the cause of stalls.
Anyone could help me figure out why ?
Thanks a lot!!
PS: it has been confirmed from the technical manual that the event 0x0 count the L1 cache miss only. Plus, it states that a mis-predicted branch will cause event 0x1 to increment by 1. But the mismatch still exists:the stall count goes up far more than the cache miss count.
Is there anyone who could help me explain this ?
Hello,
I think the event 0x1 does not mean instruction cache miss stalls but means instruction buffer stalls.
The cachemisses, micro TLB misses and branch miss-prediction would be candidates of the cause of the event 0x1.
I guess the cause of increase the number of the event 0x1 would be the number of stalls of branch taken.
If the branch prediction was disabled, branches would be thought as not taken and if the branch would be taken the event 0x1 stalls would occur.
Best regards,
Yasuhiko Koumoto.
hmm...but I count the branch mispredictions and micro TLB misses as well. The numbers mentioned above (171 and 434 stalls) are the results subtracted from the those two metrics.
I mention branch execution time.
If the branch prediction was disabled, much time would take to execute branches.
If it will be more than one cycle, it will be added to instruction stalls.
do you mean that if the branch prediction was disabled, the branches taken would cause the mis predictions and increment the counter?
I think so but I am not sure.
I agree partly. But according to the manual the mis prediction would increment event 0x1 only by 1 (although the penalty of flush is 4 cycles), and I believe all the mis predictions are counted in event 0x6 (viz. branch mis-predicted). Thus I have subtracted the event 0x1 from event 0x6 and the numbers I presented are the differences which I assume to be cache-miss-caused stall cycles
I checked ARM1136 TRM and ARM Architecture Reference Manual v6 but I cannot find out the statement that "the mis prediction would increment event 0x1 only by 1". Where does it come?
it's in ARM11 performance monitor unit , page 5
Thank you! I can confirm it.
I guess the behavior will be applicable only to the case which the branch prediction is enabled.
I'm sorry but I have no more comments.
Yasuhiko Komoto.
Thank you anyway!!