semaphore_take: mov w2, #1 // LOCK value dmb sy // ensure all observers observe data before aquire is attempted ldaxr w1, [x0] // attempt to read and aquire lock cbnz w1, semaphore_take // lock is not zero so loop and try and aquire again stxr w3, w2, [x0] // attempt to store LOCK value cbnz w3, semaphore_take // retry if store failed dmb sy // ensures that all subsequent accesses are observed after gaining of the lock is observed
The above semaphore code worked flawlessly until the moment I added FIQ code.
Now the semaphore leaks thru critical sections even though the FIQ does not come anywhere near the semaphore.
Is there something I need to know about FIQ and LDREX/STREX functions?
The FIQ has the effect of clearing the local monitor on exception return.
(See the architecture manual https://developer.arm.com/docs/ddi0487/latest, B2.9.4 Context switch support)
... but then the store should fail.
If the fiq upset the local monitors as per what vstehle says then another core can get an aquire lock if an fiq occured between the one core getting a lock and it storing in that compare bit.
It only cropped up in testing when I was testing for race conditions, you really have to belt the system to get it to occur. On normal operation I had not even noticed the problem as the chances of an fiq in that exact section was low.
Sorry I should say if it returns from an FIQ at that exact opcode not gets an FIQ.
Thank you both problem solved ... yep just needed CLREX before I did eret on FIQ and fixes problem.
As of vstehle's answer, a CLREX shouldn't be needed. Please try w/o CLREX but with "LDXR" instead of "LDAXR" (see B2.9)
Nope it fails I tried mixing around various combos they don't seem to be paired like you suggest.
"ldaxr" and "stxr" works the same "ldaxr" and "stlxr" and every other combo probably because I have the dmb sy in place.
A forum comment says that ldaxr is essentially ldxr + dmb sy ... or is that not correct
This is actually on a BCM2837 and they have ripped the GIC out and put there own in .. given it is the FIQ behaviour I wonder if that is what is the difference.
I doubt that it is related to the GIC or their interrupt controller. In the end, it pulls the "FIQ" line and the core should handle the FIQ.I seems, the CLREX is sent to all PEs, but the the implicit clearing due to an exception return isn't.
Maybe something to investigate, or maybe just something to "accept" ;-)
LdB said:A forum comment says that ldaxr is essentially ldxr + dmb sy ... or is that not correct
Also my understanding. Though it is not listed in the TRM like this.
I think the key point is "clearing the local monitor". But B2.9.2 has also this?!
• When the global monitor is in the Exclusive Access state, it is IMPLEMENTATION DEFINED whether a CLREX instruction causes the global monitor to transition from Exclusive Access to Open Access state.
Hi 42Bastian Schick,
The global monitor is typically not in the CPU cluster in a "big" system but further away from the cores.
For a multi-cluster system with a CCN for example, the global monitors are in the CCN.
I think that not requiring the CLREX instruction to affect the global monitor is there to help implementation.
And implementations are still free to propagate the CLREX to the global monitor if they want.
Ok, from reading the docs I understand it like this:- local monitor:
* local to the PE
* cleared by ERET or CLREX
* shared attribute not set
- global monitor:
* for all PEs (any bus master connected to it, but at least the CPU cores)
* shared attribute is set
* not cleared by ERET but cleared by CLREX.
I mostly agree, except for the last sentence, which depends on which system it runs on.
It is implementation defined if CLREX has an effect on the global monitor:
"When the global monitor is in the Exclusive Access state, it is IMPLEMENTATION DEFINED whether a CLREX
instruction causes the global monitor to transition from Exclusive Access to Open Access state."
ok, thanks. I am thinking, of the implication of this ... can't find any.In fact, the problem the OP saw should not be solved by using a CLREX in the FIQ code. So maybe the problem LdB has lies somewhere else.
When you talk about "shared attribute is set", is that just the TLB entry or is there another attribute for me to look at?