Hello,
In armv8 Spec, CTR_EL0.DIC/IDC is described as follows:
I really don't get the point of the two bits.
Can someone give me a scenario to explain how the two bits affect?
really appreciated.
The scenario is explained in the section "E2.2.5 Concurrent modification and execution of instructions" of the armv8 manual, under a comment "Coherency example for self-modifying code". The instructions that IDC and DIC target are respectively the DCache Clean and the ICache Invalidate seen in the code sequence associated with that example.
These bits, when set to 1, allow the software to skip/remove instructions which otherwise would be necessary to establish coherency between the DCache and the ICache.
These bits are how an armv8 implementation informs the software whether it needs to Clean the DCache and Invalidate the ICache when the CPU wants to execute the instruction stream which it wrote (under a normal, cache-enabled scenario).
In Cortex-A72 TRM, these bits are RES0, which means that, on A72, the software must perform both DCache Clean (unless the other conditions about LoC or about LoUIS && LoUIU apply) and and ICache Invalidate to the PoU before the CPU can execute the instruction stream it wrote. Performing DCache Clean and ICache Invalidate targets the weakest of the implementations; that code sequence should work without change even on implementations which provide stronger guaratees (i.e. set either or both of IDC/DIC to 1)
In Cortex-A76, the IDC bit can be 1. If it is 1, the software does not need to run DCache Clean (but presumably must still run ICache Invalidate since A76 has DIC bit as RES0).
On Linux, one can see how IDC bit being 1 can help reduce code:
SYM_FUNC_START(dcache_clean_pou) alternative_if ARM64_HAS_CACHE_IDC dsb ishst ret alternative_else_nop_endif dcache_by_line_op cvau, ish, x0, x1, x2, x3 ret SYM_FUNC_END(dcache_clean_pou)
With IDC facility avilable, all the dcache_clean_pou needs to do is to ensure that the writes are complete, and any automatic DCache clean maintenance (presumed to be provided by the IDC facility) is also complete. Without the IDC facility, the OS must perform the DCache Clean to PoU.
Similarly when DIC facility is available (not shown above), the OS just needs to flush its pipeline and refetch from ICache by calling ISB - the ICache invalidation has been automatically handled in the background by the hardware, for instance.
I have read the linux code.
Can I say that DCache clean will not be needed for any cache coherency situation since the hardware will do it when IDC = 1?
digital_kevin said:Can I say that DCache clean will not be needed for any cache coherency situation since the hardware will do it when IDC = 1?
One cannot say that.
The IDC/DIC facility is limited to establishing coherency between the DCache and the ICache. The facility does not dictate rules for other situations dealing with coherency.
For e.g., when cacheable data is to be read by a device through DMA, it is typically required to clean the data cache upto the PoC, so that the device reads the updated data and not the stale data. The IDC bit does not cast influence over this coherency situation (unless, for instance, PoC is the same as PoU, etc.).
Thank your for the detailed reply.
So the key point is DCache and ICache coherency. It seems that this kind of coherency is not common.
Except self-modifying code, is there any other DCache-ICache coherency situation we should pay attention to?
I would say moving code from disk to RAM is such a situation.
42Bastian Schick is correct.
The situation also arises in case of dynamic compilation (C#, or emulation/virtualization like QEMU), or in case of debugging (setting a breakpoint usually requires modifying the instruction stream).