Hi,
I am writing a simple spinlock taking Juno arm trusted firmware spinlock code.
But for me stxr instruction is always failing giving w1 value as "1" always.
When I read ARMv8 spec it says (under section B2.10.5) "Unpredictable behavior when load-ex/store-ex access a different number of registers" - subsection stxr instruction always fails returning "1".
I am not sure how for Juno below code is working as the ldaxr and stxr instruction access different number of registers and also if I take same code it doesn't work and stxr instruction always fails.
I suspect may be I am setting memory attributes wrongly (for X0) ? Can some one explain the difference between ldxr vs ldaxr and stxr vs stlxr instruction as well ?
mov w2, #1
sevl
l1: wfe
l2: ldaxr w1, [x0]
cbnz w1, l1
stxr w1, w2, [x0]
cbnz w1, l2
ret
Thanks,
I'm no great expert but to go through your points:
The code's purpose is to get a lock on some data. The data is unlocked when the semaphore is 0 and locked when it is 1. The load does an acquire so any data written before the release of the data when the lock is freed is up to date.
You are loading and storing the same number and type of values - a single word register. The Rs is just a status response.
If you are using normal memory it should work. You are saying you always get 1 in w1 after the stxr, that shouldn't happen so the question is why is it happening.
Racking my brain the only feeble thought that comes to mind is that perhaps somehow you are single stepping through the code. A debugger should execute code from the ldaxr to the stxr without single stepping otherwise the local monitor will almost certainly be cleared. Hopefully someone else has a better idea.
Yes strx is failing all the time even though ldarx is success.
I am not sure between these two instructions what is going wrong and how strx might be failing all the time as I am running only single CPU.
I am trying to find out if it can be due to some cache issue ?
I'd check that by having a breakpoint on the cbnz after the stxr and checking that w1 is 1 and that [x0] is 0. That would avoid touching the bit between l2 and there. Why do you think there might be some cache issue, are you accessing an area you've set up with some special properties rather than normal memory?
I am setting lock memory area as normal memory with inner shareable properties, nothing special.
I was just thinking what could be other problems and though about caches.
Even if I don't do single stepping my code hangs on condition where strx return "w1" is = 1.
From your last reply I am not very clear but it seems you are able to reproduce the failure where w1 is 1 and [x0] is 0.
So even if you don't put break-point I think you will hit same issue ?
If I but breakpoint at RET instruction then the memory seems to be updated.
I think it is only issue with single stepping. I might be doing some thing wrong before.
I will mark your answer correct !!
Glad that worked. Yes you have to careful not to put breakpoints in the middle of code like that. It can be a bit of a pain. Thankfully there are going to be some nice atomic instructions in the future in ARMv8.1 but I guess these load/store exclusive instructions will have to be used for a while yet.
The ARMv8-A architecture and its ongoing development