Is it normal that DC CVAC does also carry out a cache line invalidate on Cortex-A53?

Hello,

I'm attempting to make use of the processor's cache for memory mapped IO. 

In theory, this is possible by explicitely maintaining the cache. That is, before reading from memory mapped IO the according cache lines have to be invalidated (e.g. by making use of DC CIVAC), and after writing the according cache lines have to be flushed resp. cleaned (e.g. by making use of DC CVAC). 

From the practical behavior I think that I can conclude that the Cortex-A53 also seems to invalidate the cache line as a result of DC CVAC. From a memory semantics point of view this would be just fine. However, it is hurting performance in some scenarios.

Within the ARMv8 documentation I can't find any signs that DC CVAC is virtually falling back to DC CIVAC. Maybe I misunderstood something as well....

Are there any ideas on that matter?

Thanks,

Mario

Parents Reply Children
  • Hello Martin,

    thanks for clarifying this! So there seems a good chance that the operating system in question (which is Linux btw.) indeed is setting things accordingly for whatever reason. After scanning the source code I came across this as part of the Linux bootloader (arch/arm/cpu/armv8/start.S)

    apply_a53_core_errata:

    #ifdef CONFIG_ARM_ERRATA_855873
            mrs     x0, midr_el1
            tst     x0, #(0xf << 20)
            b.ne    0b

            mrs     x0, midr_el1
            and     x0, x0, #0xf
            cmp     x0, #3
            b.lt    0b

            mrs     x0, S3_1_c15_c2_0       /* cpuactlr_el1 */
            /* Enable data cache clean as data cache clean/invalidate */
            orr     x0, x0, #1 << 44
            msr     S3_1_c15_c2_0, x0       /* cpuactlr_el1 */
            isb
    #endif

    So I think that things are clear here. The actual erratum 855873 is described here: https://developer.arm.com/documentation/epm048406/latest/

    As for your concerns regarding caching on MMIO: Essentially I totally agree with you here. However, in the particular case this is intended for memory regions present in the IO space. So these optimizations like reordering, prefetching, combining etc. are not an issue here. When going hand in hand with appropriate explicit cache management there can be achieved a significant performance gain when accessing these external memories. Of course, doing that with a bunch of CSRs would be certainly not a good idea.

    This is actually a bit out-of-scope for this question, but in experiments I did also experience what might be a bug in the Cortex-A53 that finally forbids the use of CPU writes to such cached MMIO regions anyway. I'm not able yet to pin this down further and it might also be a bug within the micro controller (Texas Instruments AM6442) that has got integrated the Cortex-A53. But there appear situations where the first few words of a block written through the external interface are falsely written as all-ones (i.e. 0xffffffff). Therby this seems not exclusively limited to the actual use of the cache, but appears also (albeit more seldomly)  when the system is generally set up to allow a cachable mapping, but the cache has not been enabled for the particular mapping. If this is of interest, I discussed this publicly on the forum of TI which can be read in this thread: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1552547/am6442-burst-size-limitations-of-the-gpmc-interface (only the last four posts are relevant here).