Hi ARM specialists,
I have a question about Cortex-M series pipeline behavior.
According to the page 15 of "ARM Cortex-M Programming Guide to Memory Barrier Instructions Application Note 321", it is described that "Instruction fetch can happen several cycles before decode and execution". If fetch, decode. and execution stages are synchronized, the decode and execution stages would take the same cycles as the fetch stage. If it is true, the long prefetch (or fetch) stage makes performance lower. I think that the prefetch and decode stages are decoupled because the above assumption would be strange.
Is it true? I would like to know the relationship between the prefetch and decode stages of Cortex-M0/M0+/M3/M4. That is, would the prefetch stage latency affect the following stages or not?
Yasuhiko Koumoto.
Hello all,
my intention of this question is I want to know that the prefetch will be performed independently from the corresponding decode and execution stage. I think it is true in Cortex-M0/M3/M4 case. However it is not true in Cortex-M0+ case. Is it true?
I will attache the figure in which my guess of Cortex-M3/M4 pipleline scheme is described. I am lokking forward to the feedback.Best regards,Yasuhiko Koumoto.