This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A9 WFE instruction not working as expected

Note: This was originally posted on 10th October 2012 at http://forums.arm.com

I'm trying to do a simple test with WFE instruction and EVENTI/O signals. I'm trying to have one A9 enter a wait state with WFE, and have another A9 do some dummy operations while the first is waiting, then do SEV instruction to signal from its SIGNALO to the other A9's SIGNALI. I can verify with my simulation envronment that this signal fires. However, the first A9 is not waiting, it breezes right by the WFE instruction. No assembler warning indicating these instructions aren't supported.

My best clue is in the assembler guide there is the line "If the Event Register is set, WFE clears it and returns immediately," it sounds like this might be happening, but I can't find any reference to "Event Register" in the Cortex A9 TRM, so I'm not sure if this is active by default in my system or what.

Any ideas? Thanks!
  • Note: This was originally posted on 11th October 2012 at http://forums.arm.com

    Thanks for both replies. It looks like section 7.3 of that barrier cookbook is what I want, and it describes just what ttfn suggested.

    So just out of interest, is a purely asynchronous kind of IPC possible? It seems like all the implementations in that guide involve polling a lock, flag, shared mem, etc. I am fairly new to ASM level programming so maybe I am used to things at the application level that the OS is doing for me.
  • Note: This was originally posted on 12th October 2012 at http://forums.arm.com

    More on this...I am trying to implement a lock based on section 7.3 of the barrier cookbook linked to above. Here is my function, called from a C program.

    ; void acquireLock(int* lockAddr)
    acquireLock
    LDREX r1, [r0]
    CMP r1, #0         ; 0 = unlocked
    WFENE
    MOVEQ r2, #1       ; 1 = locked
    STREXEQ r3, r2, [r0]  ; r3 = 1, but the store DOES occur!
    CMPEQ r3, #0
    BNE acquireLock
    DMB
    MOV pc, lr


    On first execution, it reads the value from *lockAddr and it is 0 (i.e. r1=0). So it gets to the STREXEQ instruction, which DOES store value 1 in memory, but the return value r3=1 (failure). So, it branches back to the top, reads that value again and it is 1 (r1=1). So it the does WFENE and enters wait mode (it actually does this twice, the first time it does not go into wait mode but returns just to reach that same WFENE as the logic has not changed and it hangs then).

    My other process eventually enters the same WFENE loop as the value is still 1. According to the assembler guide there are a couple reasons by the STREX could return 1 in r3, but both involve the store failing as well. I see through the code logic and my environment memory tools that the store did occur. The memory location r0 refers to is cacheable, bufferable and shared.

    Any ideas?
  • Note: This was originally posted on 16th October 2012 at http://forums.arm.com


    More on this...I am trying to implement a lock based on section 7.3 of the barrier cookbook linked to above. Here is my function, called from a C program.

    ; void acquireLock(int* lockAddr)
    acquireLock
    LDREX r1, [r0]
    CMP r1, #0         ; 0 = unlocked
    WFENE
    MOVEQ r2, #1       ; 1 = locked
    STREXEQ r3, r2, [r0]  ; r3 = 1, but the store DOES occur!
    CMPEQ r3, #0
    BNE acquireLock
    DMB
    MOV pc, lr


    On first execution, it reads the value from *lockAddr and it is 0 (i.e. r1=0). So it gets to the STREXEQ instruction, which DOES store value 1 in memory, but the return value r3=1 (failure). So, it branches back to the top, reads that value again and it is 1 (r1=1). So it the does WFENE and enters wait mode (it actually does this twice, the first time it does not go into wait mode but returns just to reach that same WFENE as the logic has not changed and it hangs then).

    My other process eventually enters the same WFENE loop as the value is still 1. According to the assembler guide there are a couple reasons by the STREX could return 1 in r3, but both involve the store failing as well. I see through the code logic and my environment memory tools that the store did occur. The memory location r0 refers to is cacheable, bufferable and shared.

    Any ideas?


    Just wanted to bump my last question...I have been continuing to look into it and can't make sense of it. Very strange that STREX returns 1 yet the store occurs in memory. Does anyone have an idea?
  • Note: This was originally posted on 16th October 2012 at http://forums.arm.com


    Just wanted to bump my last question...I have been continuing to look into it and can't make sense of it. Very strange that STREX returns 1 yet the store occurs in memory. Does anyone have an idea?


    Nevermind guys, I figured this out. It was a problem with my simulation environment, the memory model did not have an exclusive monitor enabled. Still seems like wrong behavior to me, I feel like the memory write shouldn't have occurred but maybe this is a problem with the RAM model implementation.
  • Note: This was originally posted on 11th October 2012 at http://forums.arm.com

    I think you've already spotted the problem - the event register.  You don't know its initial state.  If it is set, the the core will appear to "wake" immediately.

    WFE/SEV are not themselves suitable for implementing synchronization primitives.  That is what LDREX/STREX are for.

    What I'd suggest in your case is to use a separate flag to indicate the second core has finished.  The first core loops waiting for the flag to be cleared - with a WFE in the loop.  That way if you wake unexpectedly you can see that the second core hasn;t finished yet and then go back to sleep.
  • Note: This was originally posted on 11th October 2012 at http://forums.arm.com

    Also consider the impact of ordering of barriers on your locking code - A9/A15 and many ARM cores from other vendors out-of-order - so you need care when rolling your own signalling system to ensure you play nicely with the ARM weak memory model. This might help:

    http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf


    Iso