I am new to ARM architecture and trying to understand ARMv7 pipelining.I am comfortable with armv7 instruction set
Can anyon provied me simple example for operation ARMv7 pipeline with simple instrction?
Thanks
Amit
Hi Tamar,
Welcome to the ARM family!
There is an important distinction which I think you are missing. This is to be careful to distinguish the architecture (ARMv7-A, ARMv5TE etc) from the implementation (Cortex-A15, ARM926EJ-S etc). The architecture of a device specifies its behaviour and functionality i.e. the instruction set, register set, exception model, programmer's model etc. Specifically, this says nothing about pipeline structure or any number of other internal microarchitectural details. These are part of a specific implementation of that architecture.
So, for instance, Cortex-A15 and Cortex-A7 are both implementations of ARMv7-A architecture (with some extensions). So they are binary compatible. But internally they are very different. Cortex-A15 has a much longer and more complex pipeline, whereas Cortex-A7 has a more straightforward pipeline. Cortex-A15 is superscalar, Cortex-A7 is in-order.
So, it doesn't really make sense to ask the question the way you phrased it as there are many different implementations of the ARMv7 architecture.They can all have different pipeline structures and will differ internally in many other ways too. You can find some information (not all is disclosed publically) about individual implementations in the Technical Reference Manuals for individual processors - you can find these on ARM's website at infocenter.arm.com.
I hope that makes sense and gives you some pointers for further research.
Chris
Thanks Chris for your deatiled explanation on it.
I have just gone through one of other post answered by you where you mentioned the earlier ARM architecture (going back to ARM1 and ARM2 in 1985 and up to ARM7TDMI about ten years after that) had a three stage pipeline.
It would be nice if you provide a simple example for three stage pipeline for one simple instruction.
Hi,
For the ARM7TDMI three stage pipeline, things are pretty simple. The three stages are Fetch, Decode and Execute. The Fetch stage simple fetches the next instruction from memory at the address pointed to by the PC; the Decode stage then determines what that instruction does and what registers it needs; the Execute stage does everything else!
For an ADD instruction, for instance:
Fetch - get instruction word from memory
Decode - determine this is an ADD instruction and determine which registers it needs and also whether it needs an immediate value from the instruction word
Execute - use the ALU to add the two values together (either two registers which are read from the register bank, or one register together with an immediate value extracted from the instruction word) and then write the result back to the destination register in the register bank.
For an ADD instruction, all three stages take one cycle each.
Things get more interesting for a LDR instruction:
Fetch and Decode are essentially the same
Execute - There are three distinct operations here
- Cycle 1 - use the ALU to calculate the address (this will be a register with an optional offset from a register and/or an immediate constant)
- Cycle 2 - issue the address on the address bus
- Cycle 3 - receive the data on the data bus and write it back to the destination register in the data bank
So, you can see that the Execute stage takes three cycles for an LDR instruction.
The other common interesting case would be a branch instruction. In the first cycle of the Execute stage, the ALU is used to calculate the address of the next instruction (by adding the offset in the instruction to the PC). It then takes two cycles to work the target instruction through the pipeline to the execute stage. The processor makes use of these two cycles to calculate the return address - first it copies the value of PC to LR and it then subtracts 4 from that value to give the correct return address. Again, the ALU is used to do this.
For the very early processors, like the ARM7TDMI, there is a reasonable level of detail about instruction executing timing in the TRM. For later processors the TRMs tend to avoid giving much detail about this.
Hope this helps.
Thanks Chris ,it definitely helped me