From the ARMv8 spec, section K11.6 there is a discussion of the issues that arise from normal memory being weakly ordered. It's clear that without a DMB in both parts of the code, P2 can see either the old value at [R1] or the new value. Leaving that aside I have a question about store reordering.
P1: STR R5, [R1] STR R0, [R2] P2: WAIT([R2]==1) LDR R5, [R1]
Say that an IRQ/FIQ occurs on P1, could it be the case that the interrupt arrives after the value has been commit to [R2] but before the value has been commit to [R1]?
If so, would the core have to ensure that the first store completes before taking the interrupt so that there is no consistency violation, or is there no such guarantee and the ELR will return us to the first store in program order, causing us to store to [R2] twice?
If an interrupt modifies the same memory location as a foreground process you need to use exclusive access. For this they are made.
From the perspective of the processor that utilizes ROB (reorder buffer) or similar queues, all instructions retire in an in-order fashion. For stores, retiring means that they move from the LSQ into the store buffer, to wait for an opportunity to update their values into the cache. It is here, in the store buffer, that the re-ordering of the stores may take place, due to various reasons like store-merging/coalescing, other non-FIFO behaviours of the buffer, or delays in accessing cache.
If the two stores in question, to [R1] and to [R2], are already in the store buffer (they are since you mentioned that [R2] has been committed to the cache, and that cannot happen unless the write to [R1] is also in the store buffer because [R1]-write is po-before [R2]-write), that means that the processor is well past those two instructions. Any interrupt at this stage will cause the store buffer to drain, thus in-turn causing [R1] to be written into the cache. The ELR will return to some instruction after those two, since from the perspective of the processor, the two stores are already 'done' as soon as they retired.
Note that the move of a store from LSQ to the store buffer may take into account factors such as all instructions that are in the pipeline and that are previous in the program order are known to be exception-free.
Edit: Correction: Replace LSQ with ROB above.
LSQ-to-ROB is 'execution'; it can be OOO.
ROB-to-STB is 'retiring'; it is in-order.
Cool. Thank you for your response.