This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Dual Issue and Pipeline stalls

Note: This was originally posted on 27th August 2010 at http://forums.arm.com

Hi,

I have the following situation in my current work. Please explain in case of any pipeline stall or preventing dual issue chance.

    LDRB     R2,[R0,#2]
    LDRB     R1,[R0,#1]
VLD1.32 {D4,D5},[R7]
    VLD1.32 {D22,D23},[R8]
VADD.I32 Q2,Q2,Q11
STR      R2,[R5]
    STR      R1,[R5]

In another instant the following situation comes

        VADD.I32 Q2,Q11,Q2
    VADD.I32 Q2,Q2,Q9
    VSHR.S32 Q11,Q2,#1
    VCGT.S32 Q2,Q11,Q0
    VLD1.32 {D22,D23},[R4]
    VBSL     Q2,Q12,Q13
    VADD.I32 Q2,Q11,Q2
    VADD.I32 Q2,Q2,Q10
    VST1.32 {D4,D5},[R1]

In this set of instruction any chance for any types of pipeline stalls in a Cortex A8 processor.

Thanks
Dave.

Parents

Peter Harris over 12 years ago

Note: This was originally posted on 27th August 2010 at http://forums.arm.com

Not too sure about the second block of code - but in the first case the critical thing to note is that the ARM and NEON units only have a single load store pipe and back to back loads and stores cannot be dual issued in the same cycle. For Cortex-A8, the following should be faster:

LDRB R2,[R0,#2] VLD1.32 {D4,D5},[R7] LDRB R1,[R0,#1] VLD1.32 {D22,D23},[R8] STR R2,[R5] VADD.I32 Q2,Q2,Q11 STR R1,[R5]

NEON instructions are forwarded to the NEON unit using integer pipelines of the ARM unit, so in the cycle the NEON instruction is forwarded (counts as an ARM integer instruction) you can also issue a load to the ARM load/store unit. That said, not tried the code, but pretty sure that is right.
Cancel
Vote up 0 Vote down

Cancel

Reply

Peter Harris over 12 years ago

Note: This was originally posted on 27th August 2010 at http://forums.arm.com

Not too sure about the second block of code - but in the first case the critical thing to note is that the ARM and NEON units only have a single load store pipe and back to back loads and stores cannot be dual issued in the same cycle. For Cortex-A8, the following should be faster:

LDRB R2,[R0,#2] VLD1.32 {D4,D5},[R7] LDRB R1,[R0,#1] VLD1.32 {D22,D23},[R8] STR R2,[R5] VADD.I32 Q2,Q2,Q11 STR R1,[R5]

NEON instructions are forwarded to the NEON unit using integer pipelines of the ARM unit, so in the cycle the NEON instruction is forwarded (counts as an ARM integer instruction) you can also issue a load to the ARM load/store unit. That said, not tried the code, but pretty sure that is right.
Cancel
Vote up 0 Vote down

Cancel

Children

No data