Support forums

Architectures and Processors forum Cortex-A8 : instruction fetch for dual-issue

State Accepted Answer
+1 person also asked this people also asked this
Locked Locked
Replies 3 replies
Subscribers 350 subscribers
Views 6181 views
Users 0 members are here

Options

How was your experience today?

This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A8 : instruction fetch for dual-issue

Gang-Ryung Uh over 11 years ago

Hi,

We experiment the following loop code (runs 4096 iterations) and we get CPI=0.66 (in other words, loop initiation interval (II) is about 6 machine cycles). We are trying really hard to reason why II is ~6 not ~5. Having said this, could you advise us whether our manual simulation is correct or not.

.LBB1_1:                                @ %for.body
                                                @ =>This Inner Loop Header: Depth=1
        ldr        r6, [r5]            I1
        mov     r3, r2             I2

ldr r4, [r2] I3
ldr r7, [r3, I4

        adds    r1, r1,           I5
        str     r7, [r2]             I6
        mla     r0, r4, r6, r0     I7
        mov     r2, r3               I8
        bne     .LBB1_1         I9

I10

I11

I12

To simplify my question, please assume that the execution of the loop iteration is in a steady state (BTB, BPB, and cache is matured). Also, I ignored the decoding stages to bring out our question more clearly.

Since Cortex-A8 is a dual-issue in-order execute/commit processor, we simulate by fetching two instructions at a time. Please give us the comment whether it is correct or not. Thanks,

time 1: fetch I1 and I2

time 2: fetch I3 and I4 - issue I1 in pipe0 and I2 in pipe1

time 3: fetch I5 and I6 - issue I3 in pipe0 (structural hazard)

time 4: fetch I7 and I8 - issue I4 in pipe0 and I5 in pipe1

time 5: fetch I9 and I10 (next sequential addr) - issue I6 in pipe0

time 6: fetch I11 and I12 - issue I7 in pipe0 and I8 in pipe1

since I9 (branch) is predicted as taken, discard I10, I11, and I12

time 7: fetch I1 and I2 - issue I9 in pipe0

time 8: fetch I3 and I4 - issue I1 in pipe0 and I2 in pipe1

...

Top replies

Chris Shore over 11 years ago +1 verified

Hi, Thanks for your question. I suspect that your model is a little too simplistic. If you look at Chapter 16 of the Cortex-A8 Technical Reference Manual, you will see that there are restrictions on the...