Cache clean of translation tables stops execution?

Hi,

I am currently working on an integrity enforcer running in a modified version of the ARM trusted firmware in EL3. To gain access to the memory I added 4 1GiB entries to the translation tables located in the TTBR0_EL3.

Now I am trying to hook the pagefault handler of the kernel and soon realized that the values read from the pt_regs struct in EL1 do not match up with what I read in EL3. I figured this may very well be due to caching issues as I was using device attributes to cause the least amount of trouble (as my GB mapping also spans across device memory). Interestingly enough I could use the `dc cvac` instruction in combination with my GB virtual address to clean the cache before accessing the struct and the values match up.

For my enforcer to work I need to parse and change the initial pagetables at a certain point. So for the sake of consistency I tried expanding the cache clean to every access on the translation table of EL1 and EL2 by placing a `asm volatile("dsb sy; dc cvac, %[address]; dsb sy")` before accessing the memory of the translation table. The scan runs without any noticeable issue and the changes are done but after returning to EL1 nothing happens anymore. To verify that I placed a printk right after the smc that causes the initial parsing and it does not print anything after that. It does continue exectution when I remove the `dc cvac` instruction before the translation table read.

I dont understand what part of the cache cleaning could interfere with code execution after the eret.

I tried different attributes in the MAIR_EL3 for my GB mapping as well (uncacheable Normal Memory, Write-{Back,Through} non-transient Normal Memory) but none of them behaved any different.

Is there any documentation towards how the data cache behaves in context changes and towards different ELs? I couldnt find any and am a little confused as I could clean the data cache with my EL3 virtual address but when setting the attributes to write-through normal memory I still didnt read the same values without cleaning the cache first. If I can clean that cache line with the address how come I do not read the values from it when reading?

Any help/hint/intuition is appreciated.

I am developing and testing on a LeMaker version Hikey Board (Kirin 620 SoC, ARM Cortex-A53 CPU) and building on the sources listed here.

  • I don't understand the details of what you are trying to do, but it seems that there are two kinds of caches you need to worry about here: the data cache and the translation lookaside buffer (TLB).

    First, for the EL3 and EL1 contexts (on the same core) to have a coherent view of a physical memory location, the virtual-to-physical memory mappings they use to access that location must agree on the cacheability attributes.

    Second, after you have modified translation tables, you should invalidate the TLB for the affected virtual addresses.

    By the way, statements like asm volatile("dsb sy; dc cvac, %[address]; dsb sy"), that implicitly claim to have no side effect on memory, can be reordered with other statements by a C compiler.

  • I don't understand the details of what you are trying to do

    Technically I am just trying to read and write contents of the EL1/EL2 translation tables from EL3.

    I am doing my inital smc in the run_init_process function of my linux kernel running in EL1.

    This smc is then catched by the firmware and causes the following things to happen in EL3:

    • Reading all values in the translation tables referenced by TTBR{0,1}_EL1 and TTBR0_EL2(zeroed page)
    • Reading all values again and conditionally change the permissions of an entry (in this case only remove write access)
    • Returning to EL1

    Immediately after the smc I placed another printk to see if the execution continues.

    The only difference between my two builds is the `asm volatile("dsb sy; dc cvac, %[address]; dsb sy");` line right before each reading one entry from the translation table. Without it EL1 execeution continues after the ERET. With that line it does execute all code in EL3 but not the prink right after the smc where it should return to.

    First, for the EL3 and EL1 contexts (on the same core) to have a coherent view of a physical memory location, the virtual-to-physical memory mappings they use to access that location must agree on the cacheability attributes.

    Does that include the Read/Write allocate value? Because if not then ive already tried most of the attributes possible. I will check on the attributes of the kernel mapping for the translation tables and replicate those in EL3.

    The type of mapping should not make a difference, right? Because I am using level 1 blocks to map all memory and the kernel level 2 blocks and level 3 pages to map more specifically. But as far as I know that should not bother the cache as it just acts on the translated physical address.

    If the attributes are the same and the Issue persists, do you have any Idea as to why the execution is not continued where it left off in EL1? The values modified only remove write access so it should not matter wether they are seen by the system or not.

    Second, after you have modified translation tables, you should invalidate the TLB for the affected virtual addresses.

    As of now after editing the translation tables I invalidate all entries in the TLB by a `tlbi alle1`. The EL3 mappings for the physical memory should not be in the cache as they replace entries that were 0 before and are also just written once on boot and never changed.

    By the way, statements like asm volatile("dsb sy; dc cvac, %[address]; dsb sy"), that implicitly claim to have no side effect on memory, can be reordered with other statements by a C compiler.

    Sorry, I dont understand what you are trying to say. Do you mean the memory access could have been made before the cleaning operation? Even if so how would that have an impact on execution in EL1?

  • So the execution stop issue should have been related to me removing write access to certain data areas of the kernel, probably including the printk buffer. As i was using printk in the pagefault itself I created an infinite loop.

    I didnt make any progress on the data reading however. I set my attributes to be the same as the kernel(0xFF) and TLBI ALLE3 but still read different values than printed by the kernel.

  • I had a same issue but quit different, i was try to use some tools of app development tools for arm development because its open source platform but i couldn't integrate to script of cache, due to this issue my app couldn't load fast & performance have been down. Then i found solution of it from lynda tutorial & implemented on it.

More questions in this forum