This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 Conditional Branch - Pipeline

Hello all!

So I'm working on a development with a Cortex M4 and there is something i don't understand, I was hoping someone could help clarify this:

This is the code I' using

(Assume R3 content is 1, R6 R8 the address needed to set PIN1, and R11 R9 the address needed to set PIN2)

asm ( "CMP R3,#0 \n\t");

asm ( "BNE NCycles_CapDelay2 \n\t");

/*asm ( "NOP \n\t"

            "NOP \n\t" 

            "NOP \n\t");*/<-----------------------------------

asm ( "STR R6,[R8] \n\t"); //PIN1 SET

asm ( "STR R11, [R9] \n\t" //PIN2 SET

asm ( "B SOMEWHERE_ELSE\n\t");

asm ("NCycles_CapDelay2: \n\t");

/*asm ( "NOP \n\t"

            "NOP \n\t"

            "NOP \n\t");*/<-----------------------------------

"STR R6,[R8] \n\t"); //PIN1 SET


asm ("LOOP_NCycles_CapDelay2: \n\t");

asm ( "SUBS R3, #1 \n\t");

asm ( "bne LOOP_NCycles_CapDelay2 \n\t");

asm ( "STR R11, [R9] \n\t" //PIN2 SET

The thing is: If i leave the NOPs commented, the time between PIN1 set and PIN2 set is 7 cycles, and if i UNcomment those NOPs, the time is 1 Cycle (measured externally with OSC)

And when R3=0, the time difference is 0 Cycles (UNcommented NOPs) to 1 Cycle (commented NOPs)

Any ideas with what is happening with the pipeline and conditional Branches here?

Thanks for any ideas.

BR

Parents
  • As you are running the asm through the C compiler rather than the assembler it's possible that the compiler is "optimizing" you code and you are not executing the exact code sequence you think you are.

    It might be worth running both builds through fromelf or objdump to check what code is /actually/ being run.

Reply
  • As you are running the asm through the C compiler rather than the assembler it's possible that the compiler is "optimizing" you code and you are not executing the exact code sequence you think you are.

    It might be worth running both builds through fromelf or objdump to check what code is /actually/ being run.

Children
  • Hello Peter, thanks for the quick response. You're right about the compiler. In the actual running code , STR is replaced with str.w and BNE is replaced with bne.n    (cheked in dissasembly view in LPCXpresso)   Those are the only differences. Either way, i still don't get why I get different performance with and without nop instrucions. Isn't it related to the pipeline?

    Thank you!!

  • Can you provide disassembler's view of your code?

  • """ASSUME R3=1"""
    1a000334: cpsid i
    1a000336: cmp r3, #0
    1a000338: bne.n 0x1a00035c <NCycles_CapDelay2>
    1a00033a: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=0>
    1a00033c: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=0>
    1a00033e: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=0>
    1a000340: str.w r6, [r8] <PIN1 SET>
    1a000344: str.w r11, [r9] <PIN2 SET>
    1a000348: nop
    1a00034a: nop
    1a00034c: nop
    1a00034e: nop
    1a000350: nop
    1a000352: str.w r11, [r9] <PIN2 CLR>
    1a000356: nop
    1a000358: nop
    1a00035a: b.n 0x1a000380 <ADC_CAPT2>
    1a00035c: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=1>
    1a00035e: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=1>
    1a000360: nop <with these nops i've achieved 0 cycle betwwen PIN1 set & PIN2 Set - R3=1>
    1a000362: str.w r6, [r8] <PIN1 SET>
    1a000366: subs r3, #1
    1a000368: bne.n 0x1a000366 <LOOP_NCycles_CapDelay2>
    1a00036a: str.w r11, [r9] <PIN2 SET>
    1a00036e: nop
    1a000370: nop
    1a000372: nop
    1a000374: nop
    1a000376: nop
    1a000378: str.w r11, [r9] <PIN2 CLR>
    1a00037c: nop
    1a00037e: nop
    1a000380: stmdb sp!, {r0, r2}
    1a000384: ldr r3, [pc, #64] ; (0x1a0003c8 <LOOP_NCycles_Period2+36>)
    1a000386: movw r2, #65535 ; 0xffff
    1a00038a: str r2, [r3, #8]
    1a00038c: nop
    1a00038e: ldr r3, [pc, #56] ; (0x1a0003c8 <LOOP_NCycles_Period2+36>)
    1a000390: ldr r3, [r3, #12]
    1a000392: cmp r3, #0
    1a000394: beq.n 0x1a00038e <ADC_CAPT2+14>
    1a000396: ldr r0, [pc, #48] ; (0x1a0003c8 <LOOP_NCycles_Period2+36>)
    1a000398: bl 0x1a000244 <Chip_SSP_ReceiveFrame>
    1a00039c: str.w r0, [r4], #2
    1a0003a0: pop {r0, r2}
    .....

    I believe that in line 1a00035c a pipeline break happens, and line 1a000362 does not execute till pipeline is full again (3 Cycles stall) and in the fourth cycle line 1a00036a gets executed, is that correct? But what about if i comment the NOPs? Why do i get a 7 cycles delay then?

    Thanks for your help

  • Any ideas on this?

    Thanks