Now, I’m aware that this is a complex question and might not be resolved here. I am new to embedded/processor programming and I would like to know if there are any major differences between those two boards(cpu wise). I thought that since it is the same CPU (which I think has the same mmu, at least I couldn’t find any information about different versions), my code for the page table config would be universal.
A bit more context, both versions run on exactly the same code, and I am configuring one gigabyte as a section for the boot loaders (which is the current code, the one writing to the page tables) memory.
When I turn on the MMU(after writing the page table to memory and configuring tcr), the next read of the pc causes a “memory, not accessible” exception, but only on the virt board, on the raspi3b board it works just fine.
Best regards Niklas
Translation tables describe the memory system to the processor. So even if the processor is the same, a differing memory system is going to need different tables. A few things you might want to check.
Perhaps obvious, but is memory actually in the same place on both boards?
Are you in the same Exception level and Security state on both systems? There are differences in the table formats based on EL/Security state. For example, in Secure state the descriptors include an NS (or NSTable) bit, which is ignored in Non-secure state.
Are you relying on any implicit configuration? Most system registers in Armv8-A don't have defined reset values, it's up to software to initialise them. It's easy to write code that "works" but is relying on an uninitialised bit (or a bit set up by previous firmware on platform but not another).
Thank you very much for your reply! It's difficult for me to properly debug the problem because of my lack of knowledge about SOCs.
I've done the following:
- checked on the security state and exception levels which are both identical (non secure and el1)
- checked if the configuration registers have had any implicit config that I'm overwriting (which they don't)
- the memory layout and experimented with different sizes/ men ranges.
But none of the above indicated any kind change. The only obvious difference between the two boards is that he raspberry boots into el3 and I'm ereting to el1 pre boot loader, but that shouldn't make a difference, right?
thx again!
Niklas43 said:The only obvious difference between the two boards is that he raspberry boots into el3 and I'm ereting to el1 pre boot loader, but that shouldn't make a difference, right?
Depends what the firmware is doing before dropping you to EL1. Something should have set up the EL2 registers for example.When you see exception, can you get the ESR_EL1/FAR_EL1/ELR_EL1 register values generated by the fault? (assuming the exception is taken to EL1)
Oh, I didn't check the regs bc no exception (I have a gic handler implemented) was thrown, but apparently they still contain information(read them with gdb now).
ESR_EL1 0x86000004 2248146948
(since I have a "parser" for that reg I know that the exception class is "instruction tAbort Taken Without Exception Level Change"(iss(dec): 4) which is an mmu fault, which says illegal instruction access (to the best of my knowledge) so nothing new)
FAR_EL1 0x260 608
ELR_EL1 0x260 608
FAR and ELR are both the same as in the qemu printed exception (which said: "Cannot access memory at address 0x260").
What do you mean with "setting up the EL2 regs", is that a must? I thought that, when using EL1/0 the others(2/3) don't have to be bothered with.
btw. I'm invalidating the mmu tbl (with TLBI VMALLE1IS) and the cache (with IC IALLUIS)
Decoding the ESR value (https://developer.arm.com/documentation/ddi0601/2022-06/AArch64-Registers/ESR-EL1--Exception-Syndrome-Register--EL1-?lang=en#fieldset_0-24_0_12):
EC = 0b100001 = Instruction Abort taken without a change in Exception level.IFSC = 0b000100 = Translation fault, level 0.EA = 0 = Not an external fault
EC = 0b100001 = Instruction Abort taken without a change in Exception level.
IFSC = 0b000100 = Translation fault, level 0.
EA = 0 = Not an external fault
So it looks like the MMU table walk is failing early in the process. Now this could be because you have a L0 table entry marked as "Fault", or because something about the table is invalid as these are also reported as L0 faults.
It's a while since I had to write MMU set up code, but here is the list of things the Arm ARM lists as causing L0 translation faults:
R_VZZSZ When one or more of the following apply, a level 0 Translation fault is generated on the relevant translation stage:• The IA does not map onto a TTBR_ELx address range.• If the IA maps onto the TTBR0_ELx address range and the IA contains any one bits above the configured IAsize as determined by TCR_ELx.T0SZ.• If the IA maps onto the TTBR1_ELx address range and the IA contains any zero bits above the configuredIA size as determined by TCR_ELx.T1SZ.• When a TLB miss occurs, the corresponding TCR_ELx.EPDn field prevents a translation table walk usingTTBRn_ELx.• When FEAT_E0PD is implemented, the corresponding TCR_ELx.E0PDn field prevents unprivileged accessto an address translated by TTBRn_ELx.• When FEAT_SVE is implemented, the corresponding TCR_ELx.NFDy field prevents non-faultingunprivileged accesses to an address translated by TTBRy_ELx.
ok, found my issue. And actually it was the first thing you said!
The virt board, opposing to the raspberry, has rom (until 1gig) and so I couldn't write to the page tables... Kind of obvious but thank you very much for your thorough help, it really helped me understand the mmu config process better!