Hello experts,
recently ARM updated the Cortex-M7 information.
I think the biggest topic would be that the pipeline details were opened.
The new information says that the integer pipeline is 4 stage and the floating point pipeline is 5 stage.
However, the past information said that it was 6 stage.
From where this differences came?
I would like to know the concrete explanation for each stage.
What is the first stage, what is the second stage, what is the third stage, what is the fourth stage, and so on?
Best regards,
Yasuhiko Koumoto.
Hello all,
I don't need the details of each pipeline stage. I just want to know the name of each stage.
In the old slide, I can read as the followings.
1st = Fetch
2nd = Decode
3rd = Issue
4th = Execute #1
5th = Execute #2
6th = Write/Store
In the new slide, I think it as the followings.
1st = Instruction Decoders
2nd = Integer Register File
3rd = Shift
4th = ALU
Is it correct?
I would like to know that the pipeline figure had been drastically changed.
Also I would like to know the relationship between the old and the new figures.
Thank you and best regards,
"I would like to know the concrete explanation for each stage." sounded like you wanted more than just stage names.
I don't see any difference between these two diagrams. The new one is polished and more detailed whereas the first one has confusing terminology. Here is a comparison between the two diagrams stage per stage, hope it helps.
(OLD) (NEW)
1. Fetch 1, Instruction Buffer
2. Decode 2. Instruction Decoder
3. Issue 3. Integer Register File (RF) access (which can be considered a part of the decoder) consumes one cycle because ports are interleaved between the two ALU pipelines (I do have some questions on this see below)
4. Execute 4. Execute
ALU #1 ALU0 (sequential operations e.g. shift and/or ALU)
ALU #2 ALU1 (parallel operations shift+ALU)
The FP pipeline has an additional stage to access the FP-RegisterFile.
What is confusing for me is that:
1. the MAC seems to be in a pipeline stage by itself. Does this mean that a Cortex-M7 is ultimately capable of issuing 2 ALU + 1 MAC/MUL operations per cycle? which explains the 6 read ports in the RF.
2. The Integer RF has 4 write ports. Two for the two integer ALU pipelines and one, I assume, for the MAC, am I right? How about the 4th write port?
If there is an ARM Cortex-M7 FAE (or a Cortex-M7 expert) patrolling this community he/she can answer these questions.
HBL
Hello Hanni Lozano,
I cannot catch your explanation.
Regarding older pipeline, the number of stages are six.
Your explanation seems that both are four.
Good morning Yasuhiko,
The six stages in the old diagram refers to the longest integer pipeline (load pipeline).
The shortest integer ALU pipeline (ALU #2) has only 4 stages (Fetch, Decode, Issue, Execute #2). The Write/Store step included in the ALU #2 pipeline is for writing/storing ALU results into the Register File and is not a separate pipeline stage.
The ALU #1 and MAC pipelines have five stages each.
I would really ignore the old diagram and focus on the new one which is more detailed.
BTW, is there a technical product brief for Cortex-M7 that explains the basic architecture? It might has answers to my previous questions which I couldn't find on infocenter.arm.com.
I wonder why did ARM mention ALU#2 for the 4 stage piple.
I think that the six stage pipe indicated the ALU#1 pipe in the old slide.
Now ALU#1 pipe seems to be four stages and I would like to know why the differential 2 stages have been vanished.
Regarding ALU#2, it will perform the shift or ALU for the simple instruction (not the parallel execution).
Also regarding MAC pipeline, it will be 4 stages.
That is,
the 1st is the Decode, the 2nd is Register File access, the 3rd is Multiply and the 4th is Accumulate.
By the way, are you ARM person?
I would like to get answer from ARM person.
Sorry, I am not an ARM employee just a partner. We are not using Cortex-M7 currently but we are definitely interested in it because of the potential big performance improvement over Cortex-M4. There are not that many commercially available M7 anyway.
The MAC is actually a 5-stage pipeline. You forgot the Instruction Fetch stage.
Getting an answer from ARM would be really nice. Is there a way to poke an ARM FAE with these specific questions? We've just recently joined the ARM community so not very familiar with the protocol.
Best,
View all questions in Cortex-M / M-Profile forum