This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Disable data prefetching in a Cortex-A53 running Android

DNovo over 6 years ago

Dear Experts,

I would like to disable the data prefetching engines of the L1 and L2 caches on a MediaTek-X20 board which includes a quad Cortex-A53 cluster and runs Android.

I have tried to include in the Linux kernel code (at kernel/init/main.c) a call to the following function:

static void __init disable_prefetch(void)
{
u64 value = 0;

printk("Manipulating data prefething register\n");

asm volatile("mrs %0, S3_1_C15_C2_0" : "=r" (value)); // read register
printk("Reading old S3_1_C15_C2_0 = %llx)\n", value);

asm volatile("msr S3_1_C15_C2_0, %0" :: "r" (value)); // write register

printk("Done manipulating data prefetching register\n");
}

However, the call to my function causes a kernel crash at booting. Instead, if I comment out the "write register" line, I am able to read the value of S3_1_C15_C2_0 during booting.

Why am I not able to modify the content of S3_1_C15_C2_0?

Best,

Top replies

vstehle over 6 years ago +2 verified

Hi DNovo , Write access to CPUACTLR_EL1 can be controlled with ACTLR_EL2 (bit 0) and ACTLR_EL3 (bit 0). The default is to disable write accesses. See: Auxiliary Control Register, EL2 Auxiliary...

Parents

+1 vstehle over 6 years ago
Hi DNovo,

Write access to CPUACTLR_EL1 can be controlled with ACTLR_EL2 (bit 0) and ACTLR_EL3 (bit 0).

The default is to disable write accesses.

See:

Auxiliary Control Register, EL2

Auxiliary Control Register, EL3
Cancel
Vote up +2 Vote down

Cancel

Reply

+1 vstehle over 6 years ago
Hi DNovo,

Write access to CPUACTLR_EL1 can be controlled with ACTLR_EL2 (bit 0) and ACTLR_EL3 (bit 0).

The default is to disable write accesses.

See:

Auxiliary Control Register, EL2

Auxiliary Control Register, EL3
Cancel
Vote up +2 Vote down

Cancel

Children

0 DNovo over 6 years ago in reply to vstehle

Hi vstehle,

Thanks for your answer.

How can I then enable write access to S3_1_C15_C2_0 if the kernel boots in EL1 and the registers you mentioned are only accessible from EL2 and EL3?

Best,
d.
Cancel
Vote up 0 Vote down

Cancel
0 vstehle over 6 years ago in reply to DNovo

Hi DNovo,

If the write access to CPUACTLR_EL1 is prevented by ACTLR_EL2, you need to modify the hypervisor code to allow access. If this is prevented by ACTLR_EL3, you need to modify the code of the secure monitor. Typically this is the ATF: https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/cpus/aarch64/cortex_a53.S
Cancel
Vote up 0 Vote down

Cancel
0 DNovo over 6 years ago in reply to vstehle

Thanks a lot vstehle! I can now control the L1 prefetch.

However, I'm still having uncontrolled prefetch from the L2 cache. This is what I get with a simple test program that reads consecutive int64 on memory:

# L1 prefetch enabled
The total L1D misses are 2429 out of 1048713 (0.23162%)
The total L2D misses are 131405 out of 262967 (49.97015%)
L2D cache accesses / L1D cache misses: 108.26142
Total reads = 1048576

# L1 prefetch disabled
The total L1D misses are 131614 out of 1048713 L1D accesses (12.55005%)
The total L2D misses are 131439 out of 263208 L2 accesses (49.93731%)
L2D cache accesses / L1D cache misses: 1.99985
Total reads = 1048576

As you can see, the 12.5% L1 hit rate corresponds to one miss out of a cache line (8B out of 64B). But I cannot understand why the L2 has still two accesses per L1D miss. Any idea?

Best,
d.
Cancel
Vote up 0 Vote down

Cancel
0 vstehle over 6 years ago in reply to DNovo

Hi DNovo,

Cortex-A53 has PMU event 0xC2 "Linefill because of prefetch" which might help diagnose (see Events).
Cancel
Vote up 0 Vote down

Cancel
0 DNovo over 6 years ago in reply to vstehle

Dear vstehle,

I've tried to gain access to the referred PMU event as follows:

PAPI_add_event(EventSet, PAPI_NATIVE_MASK | 0xC2)

but PAPI is not able to find any event higher than 0x0D... do you know any other way to access PMU events from the Linux/Android user-space?

To give you more context, I had to add the following lines to the default Mediatek-X20 device tree file in order to gain access to the currently accessible PAPI events (i.e., PAPI_L1_ICM PAPI_L1_DCM PAPI_L1_DCA PAPI_L2_DCM PAPI_L2_DCA PAPI_LD_INS PAPI_SR_INS PAPI_BR_INS PAPI_TOT_INS PAPI_BR_MSP PAPI_TOT_CYC PAPI_TLB_DM PAPI_TLB_IM PAPI_HW_INT)

+ pmu_a53_0 {
+ compatible = "arm,armv8-pmuv3";
+ interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 52 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 53 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-affinity = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>;
+ };
+
+ pmu_a53_1 {
+ compatible = "arm,armv8-pmuv3";
+ interrupts = <GIC_SPI 54 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 55 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 56 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 57 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-affinity = <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
+ };
+
+ pmu_a72 {
+ compatible = "arm,armv8-pmuv3";
+ interrupts = <GIC_SPI 58 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 59 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-affinity = <&cpu8>, <&cpu9>;
+ };
+

Best,
d.
Cancel
Vote up 0 Vote down

Cancel