This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

cortex-m3 pipeline stages, branch prediction

Hello,

Doing some research for master thesis, I've read several documents about ARMv7-M / Cortex-M3 includung reference manuals and books such as Joseph Yiu's "Definitive Guide to Cortex-M3" and Trevor Martin's "Designer's Guide ti Cortex-M Processor Family".

In all that literature I could often read the terms 3-stage pipeline and branch prediction / branch target forwarding / speculative branch target fetch, but the documents don't give further information.

I'm interested in the functional principle of the branch related unit(s) and the structure of the particular pipeline stages. Is there any information accessible regarding those architectural components?

Thanks in advance.

Parents
  • Hi Alex,

    Sorry that in many cases we can't disclose too much details of how the processor work. Here is some basic descriptions that hopefully will help you.

    Branch target forwarding - instead of handling the branch operations in execution stage, some of the branch operations can be started eariler (e.g. in decode stage).  For example, a simple unconditional branch like:

       B   offset

    can have the target program counter calculated in the decode stage and start the program fetch early.

    More information is documented in  Cortex™-M3 Technical Reference Manual 1.6. Branch target forwarding

    Speculative branch target fetch - this is used in conditional branches, e.g.

       BEQ   label1

    label2:

       ...

    Potentially the instructions at either label1 or label2 could be executed. But the instruction buffer in the processor typically already have the instruction in label2 because subsequence program code is fetched in advance, so the processor speculatively fetch the instruction in label1, even it might not be used. If the branch is not taken, the speculatively fetch instruction in label1 will be discarded. The instruction speculatively fetch does not enter further stages in the processor pipeline. This speculative fetch is useful because in many cases flash memory is slower than the processor (need waitstate in non-sequence fetches).

    Branch prediction - this is not available in Cortex-M3, but can be found in more advanced processors such as Cortex-R4/R5.

    Cortex-R5 Technical Reference Manual: 5.2. Branch prediction

    The processor make a guess based on certain information and the instructions at the branch target can move further down the pipeline, get decoded and potentially get partially executed before the conditional branch branch is resolved. If a mispredication happen, that also means the speculative fetched, decoded and partially executed instructions need to be flushed from the pipeline.

    In processors with branch prediction, it can have a simple branch prediction scheme that assumes the branch is taken / not taken based on the branch instruction types and optionally current ALU flags, or can have a complex branch prediction scheme based on some execution history. For example, Cortex-R4 and R5 have a dynamic branch predictor and is documented in Cortex-R5 Technical Reference Manual: 5.2.1. Branch predictor

    There are many other hardware solutions to help reducing the branch penalty. For example:

    - branch target address cache (store the address of branch targets)

    - branch target buffer (store the instructions of branch target)

    But these are not used in the Cortex-M3 processor.

    Hope these help.

    regards,

    Joseph

Reply
  • Hi Alex,

    Sorry that in many cases we can't disclose too much details of how the processor work. Here is some basic descriptions that hopefully will help you.

    Branch target forwarding - instead of handling the branch operations in execution stage, some of the branch operations can be started eariler (e.g. in decode stage).  For example, a simple unconditional branch like:

       B   offset

    can have the target program counter calculated in the decode stage and start the program fetch early.

    More information is documented in  Cortex™-M3 Technical Reference Manual 1.6. Branch target forwarding

    Speculative branch target fetch - this is used in conditional branches, e.g.

       BEQ   label1

    label2:

       ...

    Potentially the instructions at either label1 or label2 could be executed. But the instruction buffer in the processor typically already have the instruction in label2 because subsequence program code is fetched in advance, so the processor speculatively fetch the instruction in label1, even it might not be used. If the branch is not taken, the speculatively fetch instruction in label1 will be discarded. The instruction speculatively fetch does not enter further stages in the processor pipeline. This speculative fetch is useful because in many cases flash memory is slower than the processor (need waitstate in non-sequence fetches).

    Branch prediction - this is not available in Cortex-M3, but can be found in more advanced processors such as Cortex-R4/R5.

    Cortex-R5 Technical Reference Manual: 5.2. Branch prediction

    The processor make a guess based on certain information and the instructions at the branch target can move further down the pipeline, get decoded and potentially get partially executed before the conditional branch branch is resolved. If a mispredication happen, that also means the speculative fetched, decoded and partially executed instructions need to be flushed from the pipeline.

    In processors with branch prediction, it can have a simple branch prediction scheme that assumes the branch is taken / not taken based on the branch instruction types and optionally current ALU flags, or can have a complex branch prediction scheme based on some execution history. For example, Cortex-R4 and R5 have a dynamic branch predictor and is documented in Cortex-R5 Technical Reference Manual: 5.2.1. Branch predictor

    There are many other hardware solutions to help reducing the branch penalty. For example:

    - branch target address cache (store the address of branch targets)

    - branch target buffer (store the instructions of branch target)

    But these are not used in the Cortex-M3 processor.

    Hope these help.

    regards,

    Joseph

Children
  • Hello Joseph,

    thank you for your quick reply, I really appreciate your help.

    I understand, that such information is protected, no problem.

    Either way, your explanations regarding branching clarified the things I obviously didn't get , respectively got wrong.

    Furthermore your posted links provide quite useful information. I think I'll read more than just the mentioned chapters about Cortex-R5, too.

    Thanks again.

    Regards,

    Alex