In Section C1.3 Channel Overview of the AMBA_AXI_and_ACE protocol specifications, It is mentioned under "Store operations where the cache line is already cached" as :
The initiating master component requests a unique copy of the cache line by issuing a CleanUniquetransaction on the read address channel. This removes all other copies of the cache line and writes any dirtycopy to main memory.
Consider that our initiating master has a clean, shared copy. There is another master having a dirty, shared copy.
Now the initiating master issues a CleanUnique transaction on the Read Address Channel. Since the snooped master has had a dirty copy, the interconnect constructs a transaction to write the cacheline to the main memory, and provides a response to the initiating master.
Now at this point in time, the initiating master has a copy that is no more clean, since the copy it has with itself is modified relative to the main memory; and the previously dirty cacheline was not provided to the initiating master.
The next step mentioned is that the master performs a store and uses the RACK signal to indicate that the transaction has been completed.
This seems ambiguous since the initiating master performed a store even when it's copy of the cacheline, though unique ; wasn't clean.
Am I missing something?
(1) The load waits by the virtue of it being in the pipeline of the cpu, and of waiting for responses from the memory system. The load-store functional unit (lsu), of a typical pipelined cpu, does the communicating with, commanding of, and waiting for, the memory system, on behalf of the load operation.
The functional relation between the lsu, the store buffer (stb) and the cache, can be seen, for e.g., here.
LSU commands the buffer to drain, while the load (and other operations too) waits within the queues that the lsu maintains.
If the cpu implements store-to-load forwarding, the load can indeed be completely satisfied by its value, and the memory ordering rules do not prohibit such forwarding, then the execution of the load does not need to wait.
(2) The section "Overlapping MakeUnique" in axi4-ace spec describes the situation: the cache must invalidate (if applicable) the line, and wait for MakeUnique to complete before storing the cache line.
Thank you for replying. I understand now. I will surely contact you in case I find some other problem in future.