AMBA CHI bus atomics + ARM core implementation for big-endian

I'm trying to get clarification on the ARM core implementation of AMBA CHI bus atomics when big-endian mode is enabled.

There is an implementation in the ARM-ARM for CPU local atomics that endian swaps the incoming and outgoing memory as written to the memory sub-system. The operands for the atomic are not endian swapped, which makes sense because the CPU is working in the target endian mode already.

So the question is whether for remote bus atomics, does the core send the raw operand in the bus atomic operation or does it send the operand in bus native order.

It SEEMS like with this statement from the CHI spec:

" • The position of data bytes in the NonCopyBackWriteData or any CompData packet matches the endianness of the operation, as specified in the Endian field of the request."

... implies that the operands to the Atomic* are sent in big-endian byte order and the results of the atomic are also sent back in big-endian byte order.

Looking at the ARM-ARM implementation of local atomics as guidance, but it isn't possible to tell the core expectations for remote atomics. Would the remote implementation would be exactly like the local implementation except that the "old" values stored in memory would be converted to big-endian when returned as a result of the atomic operation.

In other words, is this the expected implementation for a remote atomic?

CPU request - AtomicStoreAdd - big-endian

CPU data - send operand to AtomicStoreAdd as big_endian in NonCopyBackWriteData

Remote side - read atomic address from memory as olddata

Remote side - send big_endian_swap(olddata) to CPU as CompData

Remote side - perform atomic add using un-swapped operand from NonCopyBackWriteData + big_endian_swap(olddata)

Remote side - perform memory write of big_endian_swap( NonCopyBackWriteData + big_endian_swap(olddata) ), which effectively stores the data back as bus order, like the core expects when doing local atomics and loading that memory.