This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

TTBR0_EL1 translation fault level 3 on 4KiB blocks where 2MiB blocks work.

I'm building kernel that is running ARMv8. I'm currently facing an issue with mapping of user processes.

TTBR1_EL1 flat-maps (identity mapping) 8GiB of memory via "PGD → PUD → 8x PMD → 512x 2MiB each" tables.
TTBR0_EL1 identity mapping works, but switching it to user process mapping fails with "translation fault, level 3". I managed to narrow it down to the size of the blocks I'm using: the user mapping works, if I use block descriptors in PMD, i.e. do a 2MiB mapping. Unfortunately, using it this way would require turning the whole allocator upside down, so I would really like to be able to use 4KiB.

The descriptors in question have the following form (modulo the PA which changes):

0x000000010000e701

My current SPSR, TCR, and MAIR setup looks as follows:

/// MAIR:
pub static MAIR_EL1: i64 = 0x0000_0000_0000_44_00ff

/// - [34:32] Set physical address range to 44 bits - this is the same as the
///           value reported by the ID_AA64MMFR0_EL1 register.
/// - [29:28] Set memory shareablitity to inner-sharable for TTBR1_EL1
/// - [31:30] Use 4KB granule size for high memory.
/// - [27:26] Outer cacheability set to W-B, R-A, W-A for TTBR1_EL1.
/// - [25:24] Inner cacheability set to W-B, R-A, W-A for TTBR1_EL1.
/// - [21:16] Size offset for high memory in bits - use 48 bits.
/// - [13:12] Set memory shareablitity to inner-sharable for TTBR0_EL1
/// - [15:14] Use 4KB granule size for low memory.
/// - [11:10] Outer cacheability set to W-B, R-A, W-A for TTBR0_EL1.
/// - [9:8]   Inner cacheability set to W-B, R-A, W-A for TTBR0_EL1.
/// - [5:0]   Size offset for low memory in bits - use 48 bits.
///
/// Used by boot.S
#[no_mangle]
pub static TCR_EL1_CONF: u64 = (0b100 << 32)
    | (0b10 << 30)
    | (0b11 << 28)
    | (0b01 << 26)
    | (0b01 << 24)
    | ((64 - 48) << 16)
    | (0b00 << 14)  // XXX: TG0 uses different values than TG1.
    | (0b11 << 12)
    | (0b01 << 10)
    | (0b01 << 8)
    | (64 - 48);

/// - [25]  Set exception endiannes at EL1 to little-endian (0)
/// - [24]  Set explicit data access at EL0 to little-endian (0)
/// - [19]  Do not mark writable regions as eXecute Never (0)
/// - [12]  Enable Instruction cache (0)
/// - [2]   Enable data cache (0)
/// - [0]   Enable MMU
/// - [...] RES1 - reserved, set to 1
#[no_mangle]
pub static SCTLR_EL1_CONF: i64 = (0 << 25)
    | (0 << 24)
    | (0 << 19)
    | (1 << 12)
    | (1 << 2)
    | (1 << 0)
    // Reserved-1 values
    | (1 << 23)
    | (1 << 22)
    | (1 << 20)
    | (1 << 11)
    | (1 << 8)
    | (1 << 7);

Why is the MMU not permitting me to use 4KiB here? How to fix it? Could it be related to the TTBR1 mapping still using 2MiB blocks?

Top replies

+1 a.surati over 2 years ago

MrMino said:
0x000000010000e701

Is this a level-2 block descriptor? The physical address isn't aligned at 2MB boundary. If it is a level-3 block descriptor, the 2 LSBs must be 1.

I tested a small setup on QEMU's rpi3b. TTBR1 maps 1GB area in 2MB blocks starting at 0xffff000000000000ul <=> 0. After switching to TTBR1 address space, the setup loads a TTBR0 that maps a 2MB area in 4KB blocks starting at 0x0 <=> 0x200000. Then, EL1 is made to write into the VA 0x1000, and the write succeeds. TTBR1 utilizing 2MB mappings should not prevent TTBR0 from using 4KB maps.

# On GDB

(gdb) info r TTBR1_EL1
TTBR1_EL1      0x3000              12288
(gdb) info r TTBR0_EL1
TTBR0_EL1      0xb000              45056
(gdb) info r TCR_EL1
TCR_EL1        0x1b5103510         7332705552
(gdb) info r SCTLR
SCTLR          0xc5183d            12916797
(gdb) info r MAIR_EL1
MAIR_EL1       0xff                255


# On QEMU monitor
(qemu) xp/1xg 0x3000
0000000000003000: 0x0000000000005003
(qemu) xp/1xg 0x5000
0000000000005000: 0x0000000000007003
(qemu) xp/16xg 0x7000
0000000000007000: 0x0000000000000701 0x0000000000200701
0000000000007010: 0x0000000000400701 0x0000000000600701
0000000000007020: 0x0000000000800701 0x0000000000a00701
0000000000007030: 0x0000000000c00701 0x0000000000e00701
0000000000007040: 0x0000000001000701 0x0000000001200701
0000000000007050: 0x0000000001400701 0x0000000001600701
0000000000007060: 0x0000000001800701 0x0000000001a00701
0000000000007070: 0x0000000001c00701 0x0000000001e00701

(qemu) xp/1xg 0xb000
000000000000b000: 0x000000000000c003
(qemu) xp/1xg 0xc000
000000000000c000: 0x000000000000d003
(qemu) xp/1xg 0xd000
000000000000d000: 0x000000000000e003
(qemu) xp/16xg 0xe000
000000000000e000: 0x0000000000200703 0x0000000000201703
000000000000e010: 0x0000000000202703 0x0000000000203703
000000000000e020: 0x0000000000204703 0x0000000000205703
000000000000e030: 0x0000000000206703 0x0000000000207703
000000000000e040: 0x0000000000208703 0x0000000000209703
000000000000e050: 0x000000000020a703 0x000000000020b703
000000000000e060: 0x000000000020c703 0x000000000020d703
000000000000e070: 0x000000000020e703 0x000000000020f703

Edit: Clarify the block descriptor question.

0 MrMino over 2 years ago in reply to a.surati

You're spot on. I confused the encoding: I thought that the 2 LSBs are the same for both: level-2 descriptors mapping 2MiB blocks, and 4KiB pages (granules), when in reality the latter ones have the same 2 LSB as table descriptors. That's one nasty sharp edge if I ever seen one.

Regarding the 2MiB alignment, the MMU seems to work with that on BCM2711. Not sure if it's UB or implementation defined, but it just ignores the rest of the unaligned address.
Cancel
Up 0 Down

Cancel
0 a.surati over 2 years ago in reply to MrMino

MrMino said:
Regarding the 2MiB alignment, the MMU seems to work with that on BCM2711. Not sure if it's UB or implementation defined, but it just ignores the rest of the unaligned address

It likely works because the MMU ignores the sub-2MB address bits on a level-2 block descriptor. The manual declares the bit#12 (among a few other bits) in a level-2 block descriptor as RES0.
Cancel
Up +1 Down

Cancel