Need Help in TF-A Code Execution

Environment:
  • Exception Level: EL3 (AArch64)
  • Component: ARM Trusted Firmware (TF-A) - BL2/BL31
  • Platform: Arm FVP / Base_Revc_2xAEMvA / Bare Metal Debug /ARMAEM-A_MP_0 [ Arm Development Studio ]
Observation:
I am observing a strange thing of the ARMv8-A architecture's memory translation rules. Despite the translation table explicitly marking the memory region as Execute-Never, the CPU continues to fetch and execute instructions from this region without triggering an exception. I'm a student with limited hardware background, and I'm learning TF-A and doing porting during my internship. I'd really like to know the reason behind this.

Code I use: ARM Trusted Firmware v2.13

Github: github.com/.../arm-trusted-firmware

Take the case of running BL31 in Development Studio as an example. (Breakpoint at the beginning of bl31_setup)

Technical Evidence (Verified via Debugger):
SCTLR_EL3: M=1 (MMU enabled), WXN=1 (Write implies execute-never), I=1 (Instruction Cache enabled).

BL31’s code is loaded at: 0x04003000

Translation Table Entry (L3 Descriptor): 0x00400000_04003743

Physical Address: Verified via TTBR0_EL3 walk. (0x04034600 -> 0x04035003 -> 0x04037003 -> 0x00400000_04003743)

Attributes: AP[2:1]=0x1 (Read/Write), XN=1 (Execute-Never), AF=1, SH=0x3 (Inner Shareable), NS=0, AttrIndx = 0x0 (See the MAIR_EL3)

MAIR_EL3: 0x4404FF (Attr0 = 0xFF, Normal Memory).

Synchronization Performed: DSB SY + ISB

The PC (Program Counter) is confirmed to executing from the first instruction of BL31 code at address 0x04003000.

The Problem:
This evidence should point to one conclusion: it cannot execute the BL31 code and will report an error. However, the execution flow remains uninterrupted.

From my point of view, it should cause "ESR_EL3 = 0x8600000F", which means:
"Instruction Abort taken without a change in Exception level.
Used for MMU faults generated by instruction accesses and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug-related exceptions."
+
"Permission Fault, level 3".

As I test on a real fpga by using similar code by making some changes at the end of BL1 so that it would execute BL2 at level EL3 (but instead of bl2_el3_entrypoint.S, it would execute bl2_entrypoint.S). In this case, it throws an error when it jumps to the first instruction of BL2, and the ESR_EL3 register displays "Permission Fault, level 3". 

If I add the instruction to disable the MMU (setting SCTLR_EL3.M_BIT to 0) at the end of BL1, and change the function to enable the MMU in the official code "arm_bl2_plat_arch_setup" to use "enable_mmu_el3(0)", it can run normally on the FPGA and bring up the UEFI. (In this real-world test, I used DDR instead of SRAM, so BL2 and BL31 were also placed here after being parsed.)

Request for Help:
The above content is beyond my comprehension; even my internship supervisor doesn't understand the reasoning behind it. Therefore, I need help from the experts on this forum.

Reference:
DDI0487M_a_a-profile_architecture_reference_manual.pdf
ARM Development Studio@Docs (such as Docs/ARM_A/xhtml/AArch64-esr_el3.html)
armv8_a_address_translation version1.1