This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Operation of ARMv7 pipeline for simple instructions

I am new to ARM architecture and trying to understand ARMv7 pipelining.I am comfortable with armv7 instruction set

Can anyon provied me simple example for operation ARMv7 pipeline with simple instrction?

Thanks

Amit

Parents
  • Hi,

    For the ARM7TDMI three stage pipeline, things are pretty simple. The three stages are Fetch, Decode and Execute. The Fetch stage simple fetches the next instruction from memory at the address pointed to by the PC; the Decode stage then determines what that instruction does and what registers it needs; the Execute stage does everything else!

    For an ADD instruction, for instance:

    Fetch - get instruction word from memory

    Decode - determine this is an ADD instruction and determine which registers it needs and also whether it needs an immediate value from the instruction word

    Execute - use the ALU to add the two values together (either two registers which are read from the register bank, or one register together with an immediate value extracted from the instruction word) and then write the result back to the destination register in the register bank.

    For an ADD instruction, all three stages take one cycle each.

    Things get more interesting for a LDR instruction:

    Fetch and Decode are essentially the same

    Execute - There are three distinct operations here

    - Cycle 1 - use the ALU to calculate the address (this will be a register with an optional offset from a register and/or an immediate constant)

    - Cycle 2 - issue the address on the address bus

    - Cycle 3 - receive the data on the data bus and write it back to the destination register in the data bank

    So, you can see that the Execute stage takes three cycles for an LDR instruction.

    The other common interesting case would be a branch instruction. In the first cycle of the Execute stage, the ALU is used to calculate the address of the next instruction (by adding the offset in the instruction to the PC). It then takes two cycles to work the target instruction through the pipeline to the execute stage. The processor makes use of these two cycles to calculate the return address - first it copies the value of PC to LR and it then subtracts 4 from that value to give the correct return address. Again, the ALU is used to do this.

    For the very early processors, like the ARM7TDMI, there is a reasonable level of detail about instruction executing timing in the TRM. For later processors the TRMs tend to avoid giving much detail about this.

    Hope this helps.

    Chris

Reply
  • Hi,

    For the ARM7TDMI three stage pipeline, things are pretty simple. The three stages are Fetch, Decode and Execute. The Fetch stage simple fetches the next instruction from memory at the address pointed to by the PC; the Decode stage then determines what that instruction does and what registers it needs; the Execute stage does everything else!

    For an ADD instruction, for instance:

    Fetch - get instruction word from memory

    Decode - determine this is an ADD instruction and determine which registers it needs and also whether it needs an immediate value from the instruction word

    Execute - use the ALU to add the two values together (either two registers which are read from the register bank, or one register together with an immediate value extracted from the instruction word) and then write the result back to the destination register in the register bank.

    For an ADD instruction, all three stages take one cycle each.

    Things get more interesting for a LDR instruction:

    Fetch and Decode are essentially the same

    Execute - There are three distinct operations here

    - Cycle 1 - use the ALU to calculate the address (this will be a register with an optional offset from a register and/or an immediate constant)

    - Cycle 2 - issue the address on the address bus

    - Cycle 3 - receive the data on the data bus and write it back to the destination register in the data bank

    So, you can see that the Execute stage takes three cycles for an LDR instruction.

    The other common interesting case would be a branch instruction. In the first cycle of the Execute stage, the ALU is used to calculate the address of the next instruction (by adding the offset in the instruction to the PC). It then takes two cycles to work the target instruction through the pipeline to the execute stage. The processor makes use of these two cycles to calculate the return address - first it copies the value of PC to LR and it then subtracts 4 from that value to give the correct return address. Again, the ALU is used to do this.

    For the very early processors, like the ARM7TDMI, there is a reasonable level of detail about instruction executing timing in the TRM. For later processors the TRMs tend to avoid giving much detail about this.

    Hope this helps.

    Chris

Children