This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARMv8 mmu problem

Hi ARM experts,

I have a problem in using armv8 mmu in bare-metal system:

When using the 4KB translation granule, level 1 table which use D_Block convert VA to 1GB region PA.

In Armv8 ARM page D4-1744, table lookup starts at level 0.

Is the Level 0 table  a essential step to map the PA?

May I bypass the level 0 table and do mmu conversion via level 1 translation table? In other words, directly fill the L1 table address in TTBR0 register.

Thanks!!

Parents
  • Hello,

    So if the virtual address is smaller than 512G,I can directly use level 1 table.

    Yes, providing you have correctly set the value of TCR_ELx.T0SZ to limit the size of the virtual address space to be between 31 and 39 bits inclusive - this corresponds to a T0SZ value of between 25 and 33 inclusive, as the size of the virtual address space is defined as being equal to (2 ^^ (64 - T0SZ)). Also, keep in mind that translations at EL1/EL0 will use either TTBR0_EL1 if the top 12 bits [63:48] of the virtual address are all 0, or TTBR1_EL1 if the top 12 bits [63:48] of the virtual address are all 1. When reducing the size of the TTBR0_EL1 virtual address space, the base stays at 0x0 and the limit moves, whereas when reducing the size of the TTBR1_EL1 virtual address space, the base moves and the limit stays at 0xFFFFFFFF,FFFFFFFF. With this in mind, you need to be careful with the virtual addresses that you are using, and take note of which TTBR register is being used to translate them.

    I think maybe the descriptor value or TCR_EL1 value that I use is not correct.

    Are you taking a Synchronous Data Abort when attempting to access a virtual address with the MMU turned on? Please can you show me the value of the ESR_EL1 (Exception Syndrome Register EL1) at the point that the exception is taken? There are a number of things that could be going wrong, the DFSC field (bits [5:0] of ESR_EL1 for a Synchronous Data Abort) will help to narrow down the issue.

    Ash

Reply
  • Hello,

    So if the virtual address is smaller than 512G,I can directly use level 1 table.

    Yes, providing you have correctly set the value of TCR_ELx.T0SZ to limit the size of the virtual address space to be between 31 and 39 bits inclusive - this corresponds to a T0SZ value of between 25 and 33 inclusive, as the size of the virtual address space is defined as being equal to (2 ^^ (64 - T0SZ)). Also, keep in mind that translations at EL1/EL0 will use either TTBR0_EL1 if the top 12 bits [63:48] of the virtual address are all 0, or TTBR1_EL1 if the top 12 bits [63:48] of the virtual address are all 1. When reducing the size of the TTBR0_EL1 virtual address space, the base stays at 0x0 and the limit moves, whereas when reducing the size of the TTBR1_EL1 virtual address space, the base moves and the limit stays at 0xFFFFFFFF,FFFFFFFF. With this in mind, you need to be careful with the virtual addresses that you are using, and take note of which TTBR register is being used to translate them.

    I think maybe the descriptor value or TCR_EL1 value that I use is not correct.

    Are you taking a Synchronous Data Abort when attempting to access a virtual address with the MMU turned on? Please can you show me the value of the ESR_EL1 (Exception Syndrome Register EL1) at the point that the exception is taken? There are a number of things that could be going wrong, the DFSC field (bits [5:0] of ESR_EL1 for a Synchronous Data Abort) will help to narrow down the issue.

    Ash

Children
  • Hi Ash,

        If the virtual address is smaller than 512G, are there any limitations for using level 0 table?

        For example: the virtual address is 512G, the first 8M bytes space is "Normal Memory", and others are "Device", uses 4KB granularity table, i tried the MMU table like following:

       1 The first entry (512G space) of level 0 translation table is a table entry, point to level 1 table, other entries of level 0 are valid;

       2 The first entry (1G space) of level 1 table is a table entry, point to the level 2 table, other entries are all block entries with "Device" attribute;

       3 The first 4 entries (2M * 4 space) of level 2 table is block entries with "Normal Memory" attribute while other entries are "Device" attribute.

       I met system hang after enabling MMU if pushed the base address of level 0 table to TTBR0, while system work well if pushed the level 1 table base address. I am tracking now but did still not find the root cause till now.

       Does any limitation exist? Thanks!

  • Hi,

    If you've set TCR_EL1.T0SZ in such a way that your virtual address space is configured to be 512GB, then what you are describing is most likely the expected behaviour. As outlined in my earlier reply, doing this at 4KB granularity will cause translations to start at L1, in other words, the translation table pointed to by TTBR0_EL1 will be interpreted as an L1 table, rather than an L0 table. This means that the first entry in your L0 table is actually mapping only 1GB, not 512GB like you think it is, because the L0 table is being interpreted as an L1 table. If all other entries in the table are fault descriptors then you'll be getting a First Level Translation Fault.

    We can confirm this by looking at the value of the ESR_EL1 register. Please can you single step the write to SCTLR_EL1 that is enabling the MMU, and then provide the value of ESR_EL1?

    Ash.

  • Ash,

         Thanks for your reply, it is very helpful!!

          I think I misunderstood the processing of "walk" of translation tables before. I missed the T0SZ affection to translation table walk. As my understanding now, the number of T0SZ identifies the virtual address space, and the virtual address space also determines from which level the init lookup start. In case that T0SZ >= 25 (virtual address space <= 512G), like you said above, "doing this at 4KB granularity will cause translations to start at L1".

  • Correct

    And to go even further, at 4KB granularity, setting T0SZ to be >= 34 (so that the virtual address space is <= 1 GB) will cause translations to start at L2, because a single L2 table at 4KB granularity maps 1GB (512 entries * 2MB per entry).

  • Hi Ash,

    The value of ESR_EL1 is 0x96000046.It's a exception of data abort.

    So I think maybe it's the descriptor value of level table is not correct.

  • An ESR_EL1 value of 0x96000046 corresponds to a 2nd level translation fault that occurred on a write instruction.

    If you're using DS-5, I suggest you use the MMU view to ensure your translation tables have been correctly configured. Once the translation tables have been programmed and TCR_EL1 has been configured, in DS-5's debug view navigate to: Window -> Show View -> MMU.

    When you take the data abort, you can check the FAR_EL1 (Faulting Address Register) to get the virtual address that couldn't be translated, and use the MMU view to narrow down the issue. In particular, the Memory Map tab of the MMU view will help to quickly spot any issues.

    I hope that helps,

    Ash.


  •    Why Block descriptor is not permitted in level-0 4K granule ?

  • It's a pragmatic choice limiting the number of TLB entry sizes and relating to
    the fact that such a coarse mapping isn't very useful in practice.

    Consider the block/page descriptor Contiguous bit, which allows for an adjacent
    number of entries to be treated as a single "block" for the purposes of caching
    those entries in the TLB.

    The number of entries considered adjacent depends on the granularity and level
    of translation:

    * 4KB granule: 16x entries at all levels
    * 16KB granule: 128x entries at level 3 and 32x entries at level 2
    * 64KB granule: 32x entries at all levels

    This effectively gives the following table of permitted TLB "blocks":

                            Level of translation
                     +----------+----------+----------+
                     |    L1    |    L2    |    L3    |
             +-------+----------+----------+----------+
          G  |    4  |    1 GB  |    2 MB  |    4 KB  |
          r  |   KB  |   16 GB  |   32 MB  |   16 KB  |
          a  +-------+----------+----------+----------+    
          n  |   16  |          |   32 MB  |   16 KB  |
          u  |   KB  |          |    1 GB  |    2 MB  |
          l  +-------+----------+----------+----------+
          e  |   64  |          |  512 MB  |   64 KB  |
             |   KB  |          |   16 GB  |    2 MB  |
             +-------+----------+----------+----------+
    

    Note how the number of entries considered adjacent has been chosen in such a
    way that it maximises the "reuse" of different TLB "block" sizes between the
    different granule sizes, which reduces the complexity of the TLB hardware. It
    also gives a good range of available TLB block sizes for each granule size,
    from small to medium to large.

    If we added an L0 block descriptor to the 4KB granule, that would add two new
    TLB block sizes, 512GB and 8TB, which would make the TLB more complex (more
    transistors equals increased die area and increased power consumption), and
    those block sizes would be very unlikely to be used anyway.

    Hope that helps.