Hello Community,
in our current ASIC project we have to replace an ARM926EJ-S with a Cortex-A5.
In the moment we are facing the following problem in our bootloader:
We intend to use the high exception vectors after reset (input vinithi is tied fix to '1') as our external DDR memory is mapped at address 0x00000000 and is normally not available during the early bootup phase.
After configuring and enabling the MMU (according to "Migrating a software application from ARMv5 to ARMv7-A/R, Application Note 425, topic 4.1) we see the value 0x00000008 in register instr_pc several times and some time later the core performs a read access to address 0x00000000 and the system stalls as the DDR2 controller isn't configured yet.
If we configure the DDR2 controller so that the memory at address 0x00000000 is available, the application is running as expected.
Also is we configure the mmu table entry for 0x00000000-0x000FFFFF to NOACCESS, the application is running.
But we want/need to understand the behavior.
Any suggestion what we are missing and why the core is accessing 0x00000000?
Thank you!
Daniel
Hello Daniel,
please let us know more details.
Where is your program located which enabled MMU?
What is the VA to PA mapping of MMU?
What is the PA of the address 0xffff0000?
What is the program sequence which causes the instr_pc 0x00000008?
Best regards,
Yasuhiko Koumoto.
Hello Yasuhiko Koumoto,
thank your for your answer.
The program is our bootloader and its located in the bootrom at address 0xFFFF0000.
The mapping is VA=PA, no remapping at the moment, therefore the physical address of 0xFFFF0000 is 0xFFFF0000.
Yesterday I realized that the program is correctly executed up to the Thumb-2 instruction TBB [pc,r1] (some operations after the MMU is enabled). If I try to single step this operation with our debugger, the stall occurs.
Here's a screenshot of the debugger around the TBB operation at address 0xFFFF273E and the corresponding register contents:
The last executed instruction was BHI 0xFFFF2828 at address 0xFFFF273C, the branch was not taken. Next instruction would be the TBB operation.
Hii illdie4u,
Did you invalidate your caches, TLB and branch predictor before enabling them? Cortex-A5 systems would need that on boot as they do not come out of reset with the caches already invalidated. It is possible you have a 'valid' branch predictor entry which is being matched when your core executes the TBB instruction. One would expect that this would end up being relatively random (the content of the caches et al. should be 'unknown') but sometimes it isn't.
How are you configuring the MMU entries as "no access" - does that mean strict permissions like AP[2:0] = 000? Or does that mean you have marked the entries as "fault" (i.e. bottom two bits are 0 in the descriptor)?
You should really be seeing an exception, not a branch to 0x0000000x - and your current CPSR.M reflects 0x13 (which is the supervisor mode). Your VBAR is 0xFFFF0000 and exceptions run in ARM mode. So, the question is.. what's the content of your vector table, is it possible that the vector table or abort handler is not correct, i.e. coded in Thumb or has the wrong offset?
The encoding of the TBB looks okay - there're no alignment restrictions on TBB (although you do have alignment faults turned on) and the offset in r1 is within the table - it's going to fetch the third byte after the PC (quite what that is, I am not sure.. my endian-foo isn't what it should be, but I think it's that 0x11) and branch to it (as in newPC = currentPC+(entry*2) where the current PC is the address of the table in this case, 4 bytes ahead of the TBB instruction address) - in theory this all looks good, and you should be branching to 0xFFFF2764 (unfortunately your disassembly doesn't go that far down..).
Ta,
Matt
Hello Matt,
thank you for your valuable and detailed answer. We already invalidate the caches, TLB and Branch Predictor before we enable them.
Yes, we configure the MMU entry (only for the first megabyte starting at 0x00000000) to "No access" by setting APX = '0' and AP = "00".
Here's another screenshot showing the exception vector table, CPU registers and code at 0xFFFF2764 and the byte table used by TBB:
Unfortunately as you already mentioned, everything looks good - in theory. But thanks again for your support!
Hello,
some new findings from my side.
I think the root cause of this problem is the branch predictor, which predicts branches to 0x0. But as mentioned above at this state of the bootphase our DDR Ram isn't available yet and an access to it causes the system to stall.
If we set the “Branch prediction policy” in ACTLR to “10” = "Branch always not taken" in register ACTLR, everything seems to work as expected.
If this is observation is correct and plausible, the workaround with configuring the MMU table entry for 0x00000000-0x000FFFFF to NOACCESS as long as there is to memory behind this address is ok for us.
Thank you.
Have you inserted the proper barriers (most likely DSB followed by ISB) after enabling the MMU?
Hi Chris,
yes, DSB and ISB is inserted.
Does the behaviour change if you mark the DDR pages as faulting rather than restricting permissions, or set the pages/sections as PXN/XN as well as no-access?
There are some complicated architectural definitions here, essentially down to the fact that the instruction side and data side have different and independent behaviours. Generally when you say a section of memory has certain read or write permission, this applies only to the data side. The instruction side can and does ignore some of this - this is why the PXN/XN bits exist, and bits like SCTLR.WXN, to prevent the instruction side from fetching from memory that you otherwise just have restricted data access permissions.
The instruction side can speculate to any region of memory which is accessible at any privilege level, at any time, even if you are not executing at that privilege level at the time - which is very important to know. The XN bits also prevent speculation on the part of the core to these regions. Note that the XN bit is only applied for memory in the Client domain!
From the ARMv7-A/R ARM (section B3.1):
Memory access permission control This controls whether a program is permitted to access a memory region. For instruction and data access, the possible settings are: • no access • read-only • write-only • read/write. For instruction accesses, additional controls determine whether instructions can be fetched and executed from the memory region. If a processor attempts an access that is not permitted, a memory fault is signaled to the processor.
Memory access permission control
This controls whether a program is permitted to access a memory region. For instruction and data
access, the possible settings are:
• no access
• read-only
• write-only
• read/write.
For instruction accesses, additional controls determine whether instructions can be fetched and
executed from the memory region.
If a processor attempts an access that is not permitted, a memory fault is signaled to the processor.
Generally the safest way to prevent anything from accessing a block of memory is mark it as faulting rather than setting access permissions. You have to do the same maintenance to the caches and TLBs and branch predictor when modifying fault->valid as you would from no-access->read/write so there's no extra cost in time or code.
I'll reiterate Chris' question: even after checking the above, are you SURE you have proper barriers after all the cache maintenance and MMU enable?
I think, strictly, you should also flush the branch predictor cache after enabling the MMU as it will contain incorrect addresses...
Performing a BPIALL{IS} before MMU enable should cover that (and illdie4you says they're invalidating the branch predictor already), but I guess it can't hurt to do it again. I definitely think this sounds a lot like a speculative instruction fetch caused by the branch predictor not being denied by the access permissions, because speculative instruction fetches don't play by the rules of the access permissions (but they do respect XN).
Hi all,
all three solutions, setting the permissions to "no access", setting the "execute never" bit and setting the section to "fault" are working.
Thank you all for the suggestions and your effort!
Hello Daniel.
have your problem been solved?
I think the essence of the issue would be the initialisation of the branch prediction or the caches.
Hi Yasuhiko Koumoto,
yes, we already had a "workaround" for this problem with setting the MMU entry to "no access" before opening this discussion.
But now we understand what is happening and why the accesses to 0x0 are occuring and we know how to best prevent them by using the XN bit.
it is good news.
The NX bit will inhibit the predicted instruction fetches and I think it made good effect on your system.
This shows the CPU had made the predicted fecth to 0x00000008.
Could I ask you whether had you taken the following procedures?
Have you already tried the instruction cache invalidation and branch prediction invalidation before enabling MMU?
Or had you only set the XN bits at the MMU setting?
For your information, the below is the summary of instruction related operations.
MCR p15, 0, <Rd>, c7, c5, 0 ARMv6 explanation: Invalidate Entire Instruction Cache Register ARMv7 explanation: ICIALLU, Invalidate all instruction caches to PoUMCR p15, 0, <Rd>, c7, c5, 4 ARMv6 explanation: Flush Prefetch Buffer Register ARMv7 explanation: CP15ISB, Instruction Synchronization Barrier operationMCR p15, 0, <Rd>, c7, c5, 6 ARMv6 explanation: Flush Entire Branch Target Cache Register ARMv7 explanation: BPIALL, Invalidate all branch predictorsMCR p15, 0, <Rd>, c7, c5, 7 ARMv6 explanation: Flush Branch Target Cache Entry Register ARMv7 explanation: BPIMVA, Invalidate MVA from branch predictors