Ordering of Memory-mapped device control with payloads

Hello ARM experts:

In chapter K14.5.4 Ordering of Memory-mapped device control with payloads of ARMv8 reference manual(version:K.a),  one of the example is:

• When a DMA peripheral has written to a buffer of data in memory, and the processing element is reading a status register to determine that the DMA transfer has completed, and then is reading the data.

It says "A DMB, or load-acquire, is not sufficient as this problem is not solely concerned with observation order, since the polling read is actually a read of a status register at a Completer, not the polling a data value that has been written by an observer."

So, for this case, the code is therefore:

      P1
           WAIT ([X4] == 1) ; X4 contains the address of the status register,
                                       ; and the value '1' indicates completion of the DMA transfer
           DSB <domain>
           LDR W5, [X2]      ; reads data from the data buffer

But I think dmb is enough since dmb ensures no reordering.When P1 observes [4] = 1, it exits the loop and than reads data from the data buffer.

So what do I misunderstand?Thanks