How does the BTIC(branch target instruction cache) works?

in cortex-A series, most core has the BITC(branch target instruction cache). It is contained in the prefetch unit.

Branch Target Instruction Cache

The PFU also contains a four-entry deep Branch Target Instruction Cache

(BTIC). Each entry stores up to two instruction cache fetches and enables

the branch shadow of predicted taken B and BL instructions to be

eliminated. The BTIC implementation is architecturally transparent, so it

does not have to be flushed on a context switch.

How does the BTIC(branch target instruction cache) works?

Parents
  • I did a quick search on BTIC in Cortex-A and it seems that the BTIC size varies from 1 entry in older Cortex-A processors to 4 entries in Cortex-A7. The BTIC function is much less complicated than the branch prediction mechanism so here is my attempt to explain its functions as clearly as I possibly can.

    1. When a conditional branch  is predicted taken then the PC will be updated with the branch target (new address) and the instruction(s) following the branch in the program will not be executed but probably fetched depending on the cache line size, etc. A good example is a loop which is most likely to be predicted taken.

    2. To offset the penalty from recovering from a mispredicted branch some processors contain a "branch shadow" or branch recovery buffer that holds the instruction(s) following the branch. These are the instruction(s) that were not executed but probably fetched implicitly with the branch instruction in the same cache line or explicitly which requires an elaborate dual path fetch unit (gets a bit complicated but just assume that the processor somehow is able to fetch these instructions and store them in the "shadow" buffer). When a branch is mispredicted the processor will issue instructions directly from the "shadow" buffer without stalling the pipeline.

    3. The BTIC is basically a "shadow" buffer that is filled directly from the cache with the instruction(s) that follow a conditional branch in the program normal order.

    4. The four entries is to accommodate multi-level branching (e.g. nested loops).

    Hope this answers your question.

Reply
  • I did a quick search on BTIC in Cortex-A and it seems that the BTIC size varies from 1 entry in older Cortex-A processors to 4 entries in Cortex-A7. The BTIC function is much less complicated than the branch prediction mechanism so here is my attempt to explain its functions as clearly as I possibly can.

    1. When a conditional branch  is predicted taken then the PC will be updated with the branch target (new address) and the instruction(s) following the branch in the program will not be executed but probably fetched depending on the cache line size, etc. A good example is a loop which is most likely to be predicted taken.

    2. To offset the penalty from recovering from a mispredicted branch some processors contain a "branch shadow" or branch recovery buffer that holds the instruction(s) following the branch. These are the instruction(s) that were not executed but probably fetched implicitly with the branch instruction in the same cache line or explicitly which requires an elaborate dual path fetch unit (gets a bit complicated but just assume that the processor somehow is able to fetch these instructions and store them in the "shadow" buffer). When a branch is mispredicted the processor will issue instructions directly from the "shadow" buffer without stalling the pipeline.

    3. The BTIC is basically a "shadow" buffer that is filled directly from the cache with the instruction(s) that follow a conditional branch in the program normal order.

    4. The four entries is to accommodate multi-level branching (e.g. nested loops).

    Hope this answers your question.

Children
More questions in this forum