Hi,I am building a cycle accurate simulator for the Cortex-A9 core, and so far I constructed most of the stages of the pipeline.
However I am having trouble placing something that is not clear in any source I have found.
Most diagrams show the prefetch stage outputting instructions to a prediction queue and a separate instruction queue. Both of those go into a superscalar decoder. This implies that it decodes one of each. However all discussions regarding speculative execution, issue and renaming seem to imply that we speculatively execute only the predicted path
as if it was correct and handle our pipeline on a misprediction afterwards, by rolling back changes and flushing some stages.
So the question is what is the reason for these 2 queues? Do they go as far as decode or even rename? Do we only pick one and dump the other at the exit of prediction, having the decoder only pick a single queue? Or is there a design to execute both paths of a branch at the same time, and keeping only the changes we want? The last one seems too complicated for such a core, but I can't find any clear indication that it is impossible.
Thanks a lot for your time.