Doing some research of the LDREX and STREX it appears that the exclusivity address range for these instructions on the M3,M4,M7 is the entire memory space. Hence you can only use the LDREX/STREX with one address. Does this not limit you to one Mutex (or at most 32 if you can bit map them?).
Thus it does not seem to be a very practical solution for an RTOS, or am I missing something?
Each exclusive access sequence (typically contains a read-modify-write) of a semaphore variable is very short. So you can have as many mutex as you like, it is just not all of them being updated at the same time.
Keeping the instruction count short between the LDREX and STREX will reduce the likelihood an exception is called between the two instructions but does not remove the possibility.
If mutexes/semaphore variables are acquired in exception handlers or you are using a preemptive RTOS it creates the possibility of an interrupt between the LDREX and the STREX which could also use the LDREX/STREX and potentially cause errors.
Therefore to be 100% safe interrupts which also uses the LDREX/STREX would need to be disabled outside of the LDREX and STREX, which makes the essentially useless, as you could use LDR and STR with interrupts/exceptions disabled to make the read-modify-write atomic.
The likelihood that an interrupt/exception happens between the LDREX/STREX increases dramatically when you are using it for a mutex with a spin wait on a lock. Specifically the code could be spinning on around ~10 instructions and if their is one CMP between LDREX and STREX you have a ~10% chance of an interrupt/exception happening on the CMP between the LDREX and STREX.
Again this only happens when you are using more than one address with LDREX/STREX because the exclusivity range is the entire memory, if the exclusivity range was smaller it would be an awesome feature but as implemented I can not see a valid use case if you need have exclusivity on more than one address.
The sequence {ldrex, {ldrex, strex}, strex} should work correctly, IMO.
The ldrex of the inner pair "overwrites" the effect of the ldrex of the outer pair.
After the inner pair unwinds and the execution is at strex of the outer pair, it does not succeed. The outer pair runs again, hoping that the inner pair does not interfere this time.
So imagine that I have a RTOS with fixed time sliced scheduling
Task 1 with medium priority goes to get a MutexAddress1 but the LDREX shows the task is not in use, but task gets removed from processor before the SDREX.
Task 2 gets processor and gets MutexAddress1 and before it is done with mutex is bumped from processor.
Task 3 goes to get MutexAddress2 and does the LDREX and the mutex is free, but gets bumped from processor before STREX.
Task 1 gets put back on processor and does the STREX, which should fail. However since the address resolution on the LDREX/STREX is the entire memory it thinks the LDREX from Task 3 (MutexAddress2) is the same as the STREX for MutexAddress1. So now Task 1 thinks it got the Mutex and so does Task 2.
Yes it takes a complex example to show the case where it fails and the likelihood this happens in the real world is very low but it is not zero.
To make a simple test case you could do this:
LDREX mutexAddress1
LDREX mutexAddress2
STREX mutexAddress1 --- If this passes the system is broken.
According to what I have read in the following documents from ARM the above simple case will break on the M3,M4, and M7 cores.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16180.html
The only way I can see to get around this is that every interrupt/exception (ie task switch) issue a CLREX such that if any exception that happens between the LDREX and STREX the STREX will fail.
Hi there,
In Cortex-M, exclusive state automatically get cleared in exception events (this is different from Cortex-A in Armv7-A, but I think Armv8-A clear exclusive state automatically as well).
https://developer.arm.com/docs/ddi0403/e/armv7-m-architecture-reference-manual
Section A3.4.4 Context switch support
"It is necessary to ensure that the local monitor is in the Open Access state after a context switch. In ARMv7-M, the local monitor is changed to Open Access automatically as part of an exception entry or exit sequence. The local monitor can also be forced to the Open Access state by a CLREX instruction."
Hope this cleared your concern.
regards,
Joseph
Wonderful! Then it make a lot more sense to use....
I figured that Arm Engineers would have thought through this, but could not find enough details online to figure it out.
Thanks
Trampas
You're welcome :-)
Instead of keep running in the spin lock loop, often an semaphore API could send a request to OS to context switch to other tasks if the spin lock cannot proceed, the OS could put this task in wait queue and get back to this spin lock later when another task release the semaphore (this require the semaphore release API to inform OS kernel of the change).