in cortex-A series, most core has the BITC(branch target instruction cache). It is contained in the prefetch unit.
Branch Target Instruction Cache
The PFU also contains a four-entry deep Branch Target Instruction Cache
(BTIC). Each entry stores up to two instruction cache fetches and enables
the branch shadow of predicted taken B and BL instructions to be
eliminated. The BTIC implementation is architecturally transparent, so it
does not have to be flushed on a context switch.
How does the BTIC(branch target instruction cache) works?
I did a quick search on BTIC in Cortex-A and it seems that the BTIC size varies from 1 entry in older Cortex-A processors to 4 entries in Cortex-A7. The BTIC function is much less complicated than the branch prediction mechanism so here is my attempt to explain its functions as clearly as I possibly can.
1. When a conditional branch is predicted taken then the PC will be updated with the branch target (new address) and the instruction(s) following the branch in the program will not be executed but probably fetched depending on the cache line size, etc. A good example is a loop which is most likely to be predicted taken.
2. To offset the penalty from recovering from a mispredicted branch some processors contain a "branch shadow" or branch recovery buffer that holds the instruction(s) following the branch. These are the instruction(s) that were not executed but probably fetched implicitly with the branch instruction in the same cache line or explicitly which requires an elaborate dual path fetch unit (gets a bit complicated but just assume that the processor somehow is able to fetch these instructions and store them in the "shadow" buffer). When a branch is mispredicted the processor will issue instructions directly from the "shadow" buffer without stalling the pipeline.
3. The BTIC is basically a "shadow" buffer that is filled directly from the cache with the instruction(s) that follow a conditional branch in the program normal order.
4. The four entries is to accommodate multi-level branching (e.g. nested loops).
Hope this answers your question.
Thanks very much!
1. I have understand the BTIC, and I also see the BTAC in Cortex-A7, my understand is that BTIC is mainly for shadow instructions but the BTAC is for the branch instructions. Is my understand right?
2. The BTIC is used for B or BL? !
Best regards!!
You're very welcome. I am glad that my explanation helped.
1. Yes. BTIC stores instructions that immediately follow a conditional branch instruction in the program. When the branch is predicted taken these instructions will not executed but are fetched and stored in the BTIC anyway for later use in case the branch was mispredicted. If the branch is mispredicted then the instructions will be decoded and issued directly from the BTIC which speed execution.
- The BTAC (Branch Target Address Cache) is for predicting branch addresses and not branch outcome (i.e. taken or not). The branch outcome prediction is done in the BTB. BTAC is used by instructions that store the target address (partially or completely) in registers e.g. BX <reg> (an interesting observation is what happens in case of a POP or load instruction that overwrites the PC, does the BTAC and/or BTB keep track of these instructions in addition to the standard branches? Don't know the answer as I haven't gone this far yet in the ARM architecture). BTW, branches that code the branch target address or offset using an immediate value don't need to use the BTAC because the fetch unit can calculate the target address directly from the immediate value and the PC.
2. I am not sure if the BTIC is used exclusively by the B and BL instructions or by any conditional branch (as I would suppose it should be). The ARM documentation for some reasons only mentions B and BL.