IO coherency also allows device to access coherent memory space. The only difference I noticed is that cache stashing connects device directly with cluster, however, IO coherency transactions need to go through system interconnect.
They're different ways to achieve a similar end result. Depending on the actual use case/data flow one or the other might be a better solution.
Arm processors have had Accelerator Coherency Ports (ACP) for a while, routing traffic through the processor's coherency logic (but then sharing the processor's bandwidth.) Cache stashing really enhances that, so masters that have some knowledge of the processor cache topology can 'push' data into a particular cache. But you need the right master and data flow for this to make sense.
"Regular" IO coherence using CCI/CCN/SMMU etc is more flexible.