This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Disable data prefetching in a Cortex-A53 running Android

Dear Experts,

I would like to disable the data prefetching engines of the L1 and L2 caches on a MediaTek-X20 board which includes a quad Cortex-A53 cluster and runs Android.

I have tried to include in the Linux kernel code (at kernel/init/main.c) a call to the following function:

static void __init disable_prefetch(void)
{
u64 value = 0;

printk("Manipulating data prefething register\n");

asm volatile("mrs %0, S3_1_C15_C2_0" : "=r" (value)); // read register
printk("Reading old S3_1_C15_C2_0 = %llx)\n", value);

asm volatile("msr S3_1_C15_C2_0, %0" :: "r" (value)); // write register

printk("Done manipulating data prefetching register\n");
}

However, the call to my function causes a kernel crash at booting. Instead, if I comment out the "write register" line, I am able to read the value of S3_1_C15_C2_0 during booting. 


Why am I not able to modify the content of S3_1_C15_C2_0?


Best,

d.

  • Hi ,

    Write access to CPUACTLR_EL1 can be controlled with ACTLR_EL2 (bit 0) and ACTLR_EL3 (bit 0).

    The default is to disable write accesses.

    See:

  • Hi vstehle,

    Thanks for your answer. 

    How can I then enable write access to S3_1_C15_C2_0 if the kernel boots in EL1 and the registers you mentioned are only accessible from EL2 and EL3?

    Best,
    d.

  • Hi ,

    If the write access to CPUACTLR_EL1 is prevented by ACTLR_EL2, you need to modify the hypervisor code to allow access. If this is prevented by ACTLR_EL3, you need to modify the code of the secure monitor. Typically this is the ATF: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/cpus/aarch64/cortex_a53.S

  • Thanks a lot vstehle! I can now control the L1 prefetch.

    However, I'm still having uncontrolled prefetch from the L2 cache. This is what I get with a simple test program that reads consecutive int64 on memory:

    # L1 prefetch enabled
    The total L1D misses are 2429 out of 1048713 (0.23162%)
    The total L2D misses are 131405 out of 262967 (49.97015%)
    L2D cache accesses / L1D cache misses: 108.26142 
    Total reads = 1048576

    # L1 prefetch disabled
    The total L1D misses are 131614 out of 1048713 L1D accesses (12.55005%)
    The total L2D misses are 131439 out of 263208 L2 accesses (49.93731%)
    L2D cache accesses / L1D cache misses: 1.99985
    Total reads = 1048576

    As you can see, the 12.5% L1 hit rate corresponds to one miss out of a cache line (8B out of 64B). But I cannot understand why the L2 has still two accesses per L1D miss. Any idea?

    Best,
    d. 

  • Hi ,

    Cortex-A53 has PMU event 0xC2 "Linefill because of prefetch" which might help diagnose (see Events).

  • Dear vstehle,

    I've tried to gain access to the referred PMU event as follows:

    PAPI_add_event(EventSet, PAPI_NATIVE_MASK | 0xC2)

    but PAPI is not able to find any event higher than 0x0D... do you know any other way to access PMU events from the Linux/Android user-space?

    To give you more context, I had to add the following lines to the default Mediatek-X20 device tree file in order to gain access to the currently accessible PAPI events (i.e., PAPI_L1_ICM PAPI_L1_DCM PAPI_L1_DCA PAPI_L2_DCM PAPI_L2_DCA PAPI_LD_INS PAPI_SR_INS PAPI_BR_INS PAPI_TOT_INS PAPI_BR_MSP PAPI_TOT_CYC PAPI_TLB_DM PAPI_TLB_IM PAPI_HW_INT)

    + pmu_a53_0 {
    + compatible = "arm,armv8-pmuv3";
    + interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>;
    + interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
    + };
    +
    + pmu_a53_1 {
    + compatible = "arm,armv8-pmuv3";
    + interrupts = <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;
    + interrupt-affinity = <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
    + };
    +
    + pmu_a72 {
    + compatible = "arm,armv8-pmuv3";
    + interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>,
    + <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
    + interrupt-affinity = <&cpu8>, <&cpu9>;
    + };
    +

    Best,
    d.