This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARMv8 mmu problem

Hi ARM experts,

I have a problem in using armv8 mmu in bare-metal system:

When using the 4KB translation granule, level 1 table which use D_Block convert VA to 1GB region PA.

In Armv8 ARM page D4-1744, table lookup starts at level 0.

Is the Level 0 table  a essential step to map the PA?

May I bypass the level 0 table and do mmu conversion via level 1 translation table? In other words, directly fill the L1 table address in TTBR0 register.

Thanks!!

Parents Reply Children
  • It's a pragmatic choice limiting the number of TLB entry sizes and relating to
    the fact that such a coarse mapping isn't very useful in practice.

    Consider the block/page descriptor Contiguous bit, which allows for an adjacent
    number of entries to be treated as a single "block" for the purposes of caching
    those entries in the TLB.

    The number of entries considered adjacent depends on the granularity and level
    of translation:

    * 4KB granule: 16x entries at all levels
    * 16KB granule: 128x entries at level 3 and 32x entries at level 2
    * 64KB granule: 32x entries at all levels

    This effectively gives the following table of permitted TLB "blocks":

                            Level of translation
                     +----------+----------+----------+
                     |    L1    |    L2    |    L3    |
             +-------+----------+----------+----------+
          G  |    4  |    1 GB  |    2 MB  |    4 KB  |
          r  |   KB  |   16 GB  |   32 MB  |   16 KB  |
          a  +-------+----------+----------+----------+    
          n  |   16  |          |   32 MB  |   16 KB  |
          u  |   KB  |          |    1 GB  |    2 MB  |
          l  +-------+----------+----------+----------+
          e  |   64  |          |  512 MB  |   64 KB  |
             |   KB  |          |   16 GB  |    2 MB  |
             +-------+----------+----------+----------+
    

    Note how the number of entries considered adjacent has been chosen in such a
    way that it maximises the "reuse" of different TLB "block" sizes between the
    different granule sizes, which reduces the complexity of the TLB hardware. It
    also gives a good range of available TLB block sizes for each granule size,
    from small to medium to large.

    If we added an L0 block descriptor to the 4KB granule, that would add two new
    TLB block sizes, 512GB and 8TB, which would make the TLB more complex (more
    transistors equals increased die area and increased power consumption), and
    those block sizes would be very unlikely to be used anyway.

    Hope that helps.