Hello,
Background: I am trying to trigger and mitigate L2 cache parity errors for a dual core Cortex A9 CPU integrated with an ARM PL310 L2 cache controller.
Among the L2 cache parity errors, there are several cases: data read access, instruction fetch, level-1 MMU table fetch and level-2 MMU table fetch.
I trigger these scenarios by injecting bit errors with the parity generation/check disabled in either data region, instruction region, level-1 MMU table region and level-2 MMU table region.
The former 3 injection scenarios trigger synchronous aborts, with a valid Fault Address.
Specific Issue: level-2 MMU table fetch error
To specifically trigger a level-2 MMU table fetch error, I had to implement the following strategy:
1) all DDR regions are configured as non-cacheable by default;
2) level-2 MMU table associated to one Page Table Entry of the level-1 MMU table has a dedicated DDR section called "mmu_tbl_l2";
3) the level-2 MMU table is built within this section by the translation_table.S assembly code. similarly to what was done here: forums.xilinx.com/.../956716
4) activate outer cacheable behavior for page table in TTBR0 register;
5) Initialize the level-1 MMU PTE and the entire corresponding level-2 MMU table for a section associated to one specific data address;
6) set the attributes of the level-2 MMU table's DDR section as outer cacheable;
7) access the specific data address
8) modify attributes of level-2 MMU PTEs within L2 cache with parity check disabled (the cacheline data will mismatch with parity afterwards)
9) enable parity check and access the specific data address again.
The result of all this is a data abort with Data Fault Status Register specificying a L2 PAGE TABLE WALK SYNCHRONOUS error, and the L2 cache controller specifying a L1 tag RAM parity error!!
To mitigate this error, I have tried to:
a) Invalidate the level-2 MMU PTE corresponding to the specific data address;
b) Invalidate the entire level-2 MMU table
c) invalidate the entire level-2 MMU table + the entire level-1 MMU table
d) invalidate the entire L2 cache
After exiting the data abort handler, only with mitigation (d) was I able to proceed within the test. Other mitigations only resulted in infinitely entering the Data Abort handler for the same reason...
So I have 2 questions:
- what does the hardware page walk actually stores in L2 cache? As the parity error is on L2 cache tag RAMs, this means the data causing the error have been filled when the parity check was disabled...
- is there any means to know which addresses are dirty when I proceed with entire L2 cache invalidation?
I know this is complicated issue, but I would infinitely appreciate any input on this.
Florian
for the second question, I wanted to ask how to know which cachelines are actually invalidated (dirty or not dirty)... on otherwords which data are actually in the L2 cache at the time of the exception
Alright, my understanding is that the 2-level page walk only needs to read a single level-1 MMU PTE into L2 cache, and then a single level-1 MMU PTE into L2 cache. If other MMU PTEs are loaded into L2 cache this might have 2 root causes:
- L2 cache prefetching
- PTEs for the code execution (only if TLB invalidation has been performed)
I will investigate these 2 directions. Please let me know if I missed something about the page walk...