Hello experts,
recently ARM updated the Cortex-M7 information.
I think the biggest topic would be that the pipeline details were opened.
The new information says that the integer pipeline is 4 stage and the floating point pipeline is 5 stage.
However, the past information said that it was 6 stage.
From where this differences came?
I would like to know the concrete explanation for each stage.
What is the first stage, what is the second stage, what is the third stage, what is the fourth stage, and so on?
Best regards,
Yasuhiko Koumoto.
"I would like to know the concrete explanation for each stage." sounded like you wanted more than just stage names.
I don't see any difference between these two diagrams. The new one is polished and more detailed whereas the first one has confusing terminology. Here is a comparison between the two diagrams stage per stage, hope it helps.
(OLD) (NEW)
1. Fetch 1, Instruction Buffer
2. Decode 2. Instruction Decoder
3. Issue 3. Integer Register File (RF) access (which can be considered a part of the decoder) consumes one cycle because ports are interleaved between the two ALU pipelines (I do have some questions on this see below)
4. Execute 4. Execute
ALU #1 ALU0 (sequential operations e.g. shift and/or ALU)
ALU #2 ALU1 (parallel operations shift+ALU)
The FP pipeline has an additional stage to access the FP-RegisterFile.
What is confusing for me is that:
1. the MAC seems to be in a pipeline stage by itself. Does this mean that a Cortex-M7 is ultimately capable of issuing 2 ALU + 1 MAC/MUL operations per cycle? which explains the 6 read ports in the RF.
2. The Integer RF has 4 write ports. Two for the two integer ALU pipelines and one, I assume, for the MAC, am I right? How about the 4th write port?
If there is an ARM Cortex-M7 FAE (or a Cortex-M7 expert) patrolling this community he/she can answer these questions.
HBL
Hello all,
I don't need the details of each pipeline stage. I just want to know the name of each stage.
In the old slide, I can read as the followings.
1st = Fetch
2nd = Decode
3rd = Issue
4th = Execute #1
5th = Execute #2
6th = Write/Store
In the new slide, I think it as the followings.
1st = Instruction Decoders
2nd = Integer Register File
3rd = Shift
4th = ALU
Is it correct?
I would like to know that the pipeline figure had been drastically changed.
Also I would like to know the relationship between the old and the new figures.
Thank you and best regards,
This is the 1s time for me to take a look at the Cortex-M7 so thanks for sharing this info. My first observation is that this pipeline diagram looks more like a CISC (instructions with different latencies) than a pure RISC pipeline hence the confusion. The shortest ALU operation takes 4 cycles which explains the 4-stage pipe. The FPU takes an additional cycle to access the FP-RF and therefore uses 5 stages.
I would just ignore the old diagram as it has incorrect info (it might have been created by the marketing department without consultation with the eng. team). For example, the write/store in the ALU pipe is not a separate stage because writes are executed at the end of the execute stage. Same for the prefetch it is only activated when predicting branches so it is not really a separate pipeline stage. In normal program execution instructions are fetched in-order.
PS: I doubt that ARM will share with you the architectural details for each pipeline stage.
View all questions in Cortex-M / M-Profile forum