Dear Experts,
I would like to disable the data prefetching engines of the L1 and L2 caches on a MediaTek-X20 board which includes a quad Cortex-A53 cluster and runs Android.
I have tried to include in the Linux kernel code (at kernel/init/main.c) a call to the following function:
static void __init disable_prefetch(void){u64 value = 0;
printk("Manipulating data prefething register\n");
asm volatile("mrs %0, S3_1_C15_C2_0" : "=r" (value)); // read registerprintk("Reading old S3_1_C15_C2_0 = %llx)\n", value);
asm volatile("msr S3_1_C15_C2_0, %0" :: "r" (value)); // write register printk("Done manipulating data prefetching register\n");}
However, the call to my function causes a kernel crash at booting. Instead, if I comment out the "write register" line, I am able to read the value of S3_1_C15_C2_0 during booting. Why am I not able to modify the content of S3_1_C15_C2_0?Best,
d.
Hi vstehle,Thanks for your answer. How can I then enable write access to S3_1_C15_C2_0 if the kernel boots in EL1 and the registers you mentioned are only accessible from EL2 and EL3?Best,d.
Hi DNovo,
If the write access to CPUACTLR_EL1 is prevented by ACTLR_EL2, you need to modify the hypervisor code to allow access. If this is prevented by ACTLR_EL3, you need to modify the code of the secure monitor. Typically this is the ATF: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/cpus/aarch64/cortex_a53.S
Thanks a lot vstehle! I can now control the L1 prefetch.
However, I'm still having uncontrolled prefetch from the L2 cache. This is what I get with a simple test program that reads consecutive int64 on memory:
# L1 prefetch enabledThe total L1D misses are 2429 out of 1048713 (0.23162%) The total L2D misses are 131405 out of 262967 (49.97015%) L2D cache accesses / L1D cache misses: 108.26142 Total reads = 1048576
# L1 prefetch disabledThe total L1D misses are 131614 out of 1048713 L1D accesses (12.55005%) The total L2D misses are 131439 out of 263208 L2 accesses (49.93731%) L2D cache accesses / L1D cache misses: 1.99985 Total reads = 1048576
As you can see, the 12.5% L1 hit rate corresponds to one miss out of a cache line (8B out of 64B). But I cannot understand why the L2 has still two accesses per L1D miss. Any idea?Best,d.
Cortex-A53 has PMU event 0xC2 "Linefill because of prefetch" which might help diagnose (see Events).
Dear vstehle,I've tried to gain access to the referred PMU event as follows:
PAPI_add_event(EventSet, PAPI_NATIVE_MASK | 0xC2)
but PAPI is not able to find any event higher than 0x0D... do you know any other way to access PMU events from the Linux/Android user-space?To give you more context, I had to add the following lines to the default Mediatek-X20 device tree file in order to gain access to the currently accessible PAPI events (i.e., PAPI_L1_ICM PAPI_L1_DCM PAPI_L1_DCA PAPI_L2_DCM PAPI_L2_DCA PAPI_LD_INS PAPI_SR_INS PAPI_BR_INS PAPI_TOT_INS PAPI_BR_MSP PAPI_TOT_CYC PAPI_TLB_DM PAPI_TLB_IM PAPI_HW_INT)
+ pmu_a53_0 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;+ };+ + pmu_a53_1 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;+ };+ + pmu_a72 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu8>, <&cpu9>;+ };+
Best,d.