We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Dear Experts,
I would like to disable the data prefetching engines of the L1 and L2 caches on a MediaTek-X20 board which includes a quad Cortex-A53 cluster and runs Android.
I have tried to include in the Linux kernel code (at kernel/init/main.c) a call to the following function:
static void __init disable_prefetch(void){u64 value = 0;
printk("Manipulating data prefething register\n");
asm volatile("mrs %0, S3_1_C15_C2_0" : "=r" (value)); // read registerprintk("Reading old S3_1_C15_C2_0 = %llx)\n", value);
asm volatile("msr S3_1_C15_C2_0, %0" :: "r" (value)); // write register printk("Done manipulating data prefetching register\n");}
However, the call to my function causes a kernel crash at booting. Instead, if I comment out the "write register" line, I am able to read the value of S3_1_C15_C2_0 during booting. Why am I not able to modify the content of S3_1_C15_C2_0?Best,
d.
Hi DNovo,
Write access to CPUACTLR_EL1 can be controlled with ACTLR_EL2 (bit 0) and ACTLR_EL3 (bit 0).
The default is to disable write accesses.
See:
Hi vstehle,Thanks for your answer. How can I then enable write access to S3_1_C15_C2_0 if the kernel boots in EL1 and the registers you mentioned are only accessible from EL2 and EL3?Best,d.
If the write access to CPUACTLR_EL1 is prevented by ACTLR_EL2, you need to modify the hypervisor code to allow access. If this is prevented by ACTLR_EL3, you need to modify the code of the secure monitor. Typically this is the ATF: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/cpus/aarch64/cortex_a53.S
Thanks a lot vstehle! I can now control the L1 prefetch.
However, I'm still having uncontrolled prefetch from the L2 cache. This is what I get with a simple test program that reads consecutive int64 on memory:
# L1 prefetch enabledThe total L1D misses are 2429 out of 1048713 (0.23162%) The total L2D misses are 131405 out of 262967 (49.97015%) L2D cache accesses / L1D cache misses: 108.26142 Total reads = 1048576
# L1 prefetch disabledThe total L1D misses are 131614 out of 1048713 L1D accesses (12.55005%) The total L2D misses are 131439 out of 263208 L2 accesses (49.93731%) L2D cache accesses / L1D cache misses: 1.99985 Total reads = 1048576
As you can see, the 12.5% L1 hit rate corresponds to one miss out of a cache line (8B out of 64B). But I cannot understand why the L2 has still two accesses per L1D miss. Any idea?Best,d.
Cortex-A53 has PMU event 0xC2 "Linefill because of prefetch" which might help diagnose (see Events).
Dear vstehle,I've tried to gain access to the referred PMU event as follows:
PAPI_add_event(EventSet, PAPI_NATIVE_MASK | 0xC2)
but PAPI is not able to find any event higher than 0x0D... do you know any other way to access PMU events from the Linux/Android user-space?To give you more context, I had to add the following lines to the default Mediatek-X20 device tree file in order to gain access to the currently accessible PAPI events (i.e., PAPI_L1_ICM PAPI_L1_DCM PAPI_L1_DCA PAPI_L2_DCM PAPI_L2_DCA PAPI_LD_INS PAPI_SR_INS PAPI_BR_INS PAPI_TOT_INS PAPI_BR_MSP PAPI_TOT_CYC PAPI_TLB_DM PAPI_TLB_IM PAPI_HW_INT)
+ pmu_a53_0 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;+ };+ + pmu_a53_1 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;+ };+ + pmu_a72 {+ compatible = "arm,armv8-pmuv3";+ interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>,+ <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;+ interrupt-affinity = <&cpu8>, <&cpu9>;+ };+
Best,d.