BTB operation.
I'm writing some tests on TIGER platform and I have some miss understanding on BTB operation. The test is doing 40 loops, were after 10, 20, 30 loops I add some extra code and check how many cycles the core need between the loop last command (its conditional always taken branch) and the first command in the loop, which is actually if the last branch is predicted taken.
1. After 10 loops I add some extra nop's 2. After 20 loops I add BTB invalidation command 3. After 30 loops I write all 512 entries of the BTB with some new branches.
The problem is that in all cases the cycle count is behave similarly ??? i.e. First 10 loops: loops 1-2 there is about 13 cycles penalty. loops 3-10 there is 1 or 2 cycle. Second 10 loops (after extra nop's): loops 11-12 there is about 13 cycles penalty. loops 13-20 there is 1 or 2 cycle. Third 10 loops (after extra nop's): loops 21-22 there is about 13 cycles penalty. loops 23-30 there is 1 or 2 cycle. Fourth 10 loops (after extra nop's): loops 31-32 there is about 13 cycles penalty. loops 33-40 there is 1 or 2 cycle.
I expect in the first case of adding nop's that all the following 10 loops will also be predicted taken, and not to pay in the first two loops the 13 cycles penalty.
My question is: - Is the above description is the core behavior, if yes why ? , and can I do something to gain the 13 cycle penalty if the branch instruction is in the BTB but some extra code with other branches were run.