This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

MMU: force identity mapping without pages?

MarkL over 7 years ago

hi,

on a cortex a53, I would like to get the best of both worlds. Having dcache enabled but no page walk in case of a miss.

I want no memory protection because we manage the whole system ( kind of baremetal processes)

Is there a way to tell the mmu controller that we are in identity mapping for the whole memory available, so that a page walk is not triggered?

thanks

Top replies

a.surati over 7 years ago in reply to MarkL +1 verified

Thank you for the clarification. It seems tlb-lockdown can provide some relief, but a53 trm has no mention of it (i.e. it is likely to be unsupported in that processor). The requirements sound very...

Parents

0 a.surati over 7 years ago
AFAIK, a single bit which informs the MMU to identity-map the physical memory may not exist on any prevalent CPU architectures.

But the MMU framework has elements (page tables, configuration registers) which allow one to setup the mapping of one's choice, under the restriction of the actual features implemented.

Those features include the # translation stages, the VA range a translation stage supports, the size of granule, and other armv8.x features, as described in the reference manual.

Solving this situation would require one to map the physical memory in the largest unit of blocks supported, but the way the MMU needs to be configured is now severely restricted by the way the PA range is arranged.

The memory map of a SOC contains physical memory regions which are not RAM. When mapping the physical memory, one still needs to (typically) setup at least certain regions as Device memory, and a few others as Normal, to allow correct and efficient memory operations.

It may also not always be possible to identity-map an entire PA range, if none of the VA ranges supported by the MMU happens to cover that PA range in its entirety.

As an alternative to an identity-map, one can select a VA range which is large enough to map the entire PA range, but which is still a linear function of the PA range.

This is not an identity-map, but the va_to_pa and pa_to_va functions are as simple as they are found under an identity-map.

Assume that a SOC has a PA range of 4GB, [0, 0xffffffff] (or a range such as [0x80000000,0x180000000)). One can map the VA range [0xffffffff00000000,0xffffffffffffffff] (or any other supported range) to it, in largest units of block supported, while taking care of keeping the properties of the individual regions consistent with the SOC's requirements.

The va_to_pa and pa_to_va functions are:

va_to_pa(va) = va - VA_BASE + PA_BASE pa_to_va(pa) = pa - PA_BASE + VA_BASE

One may also have to see if there are holes in the PA range; a straightforward map as described above may expose it to the processes, and may result in exceptions being triggered when accessed.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 a.surati over 7 years ago
AFAIK, a single bit which informs the MMU to identity-map the physical memory may not exist on any prevalent CPU architectures.

But the MMU framework has elements (page tables, configuration registers) which allow one to setup the mapping of one's choice, under the restriction of the actual features implemented.

Those features include the # translation stages, the VA range a translation stage supports, the size of granule, and other armv8.x features, as described in the reference manual.

Solving this situation would require one to map the physical memory in the largest unit of blocks supported, but the way the MMU needs to be configured is now severely restricted by the way the PA range is arranged.

The memory map of a SOC contains physical memory regions which are not RAM. When mapping the physical memory, one still needs to (typically) setup at least certain regions as Device memory, and a few others as Normal, to allow correct and efficient memory operations.

It may also not always be possible to identity-map an entire PA range, if none of the VA ranges supported by the MMU happens to cover that PA range in its entirety.

As an alternative to an identity-map, one can select a VA range which is large enough to map the entire PA range, but which is still a linear function of the PA range.

This is not an identity-map, but the va_to_pa and pa_to_va functions are as simple as they are found under an identity-map.

Assume that a SOC has a PA range of 4GB, [0, 0xffffffff] (or a range such as [0x80000000,0x180000000)). One can map the VA range [0xffffffff00000000,0xffffffffffffffff] (or any other supported range) to it, in largest units of block supported, while taking care of keeping the properties of the individual regions consistent with the SOC's requirements.

The va_to_pa and pa_to_va functions are:

va_to_pa(va) = va - VA_BASE + PA_BASE pa_to_va(pa) = pa - PA_BASE + VA_BASE

One may also have to see if there are holes in the PA range; a straightforward map as described above may expose it to the processes, and may result in exceptions being triggered when accessed.
Cancel
Vote up 0 Vote down

Cancel

Children

0 MarkL over 7 years ago in reply to a.surati

Thanks, but isn't the TLB miss page walk an automatic process?

I can't do anything at this stage to take control and implement my own function?

Edit: we can disable table walk and generate a translation fault instead. Perfect.
Cancel
Vote up 0 Vote down

Cancel
0 a.surati over 7 years ago in reply to MarkL

What I wrote won't prevent TLB misses - the page table walk will be performed on a TLB miss, but since things have been mapped, there won't be a question of taking a fault (unless the process touches unmapped/invalid area).

I mistook your statement about the DCache as 'no page table walk on a DCache miss', which did confuse me a bit.

The use case that you are trying to address is not quite clear.

Your original statement might mean that we do not want any TLB misses.

The edit in your last comment says otherwise - that we need TLB misses, but with certain control over it.
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

The main problem is the table walk and performance hit involved. I try to find a way around that since everything that will be running will be under control.

Sooner or later we will access an address that won't fit in the TLB because there are memory spread over many pages. The 512 entries won't be enough.

So when the address won't be in the cache, boom, table walk. That is why talking to the TLB might be preferable but apparently impossible. I need a way to tell the system that a table walk is unnecessary.

Apologies if my point was not clear enough, I really try to work around that problem
Cancel
Vote up 0 Vote down

Cancel
+1 a.surati over 7 years ago in reply to MarkL

Thank you for the clarification.

It seems tlb-lockdown can provide some relief, but a53 trm has no mention of it (i.e. it is likely to be unsupported in that processor).

The requirements sound very much like an attempt at 'emulating' a MPU through a MMU. The armv7-r trm speaks about PMSA and MPU, and also describes exactly the conditions that you presented here - identity-map without HW translation table walks.
Cancel
Vote up +1 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

Thanks a lot, so I am out of luck with the armv8 arch..
Cancel
Vote up 0 Vote down

Cancel
0 a.surati over 7 years ago in reply to MarkL

Oh, I did not mean to say that when I pointed out the armv7-r trm.

Armv8 architecture too provides R and M profiles, which implement MPU (optional on M).
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

I will look into it right now :)
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

You were right, I was looking at the wrong cpu family. Cortex R seem like what I was looking for!

Powerful armv8-r (r7 or r8) are a very narrow market though. Could not find vendors implementing them.. sad.
Cancel
Vote up 0 Vote down

Cancel
0 a.surati over 7 years ago in reply to MarkL

That could be due to the nature of the applications that the R-profile targets.

For e.g., r8 has been targeted at "Highest performance 5G modem and storage". I think that one could find that processor on hard disks, storage arrays, cellphones, or routers, but a board might be hard to come by.

There's "R-Car H3" SoC which contains r7.

Boards containing r4 and r5 are listed by Arm here, while those containing r5 and r7, here.

There are also software models which can be programmed.

Edit: It is likely that even the devices, which a particular CPU targets, might not be implemented yet.
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

I am actually willing to design my own board.

Now, I should ask a chinese pcba and check if I am lucky :)
Cancel
Vote up 0 Vote down

Cancel
0 a.surati over 7 years ago in reply to MarkL

Nice. :)

And for the CPU chip, Arm might know if there are any licensees. Wikipedia has a list of them.

Edit: If there are no vendors, does a FPGA solution fare well? I would not know if it is even possible with Arm, but I know Google and certain other companies run (or used to run) their AI processing on custom FPGA solutions.
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

a fpga would be really cool. My problem with fpga is the price. I need a prototype and then a first batch of 2000 boards. So I guess I am better with something less expensive for a retail price of around 35€..
Cancel
Vote up 0 Vote down

Cancel
0 MarkL over 7 years ago in reply to a.surati

I was dead wrong. Lattice fpgas are dirt cheap.

There is my take!!! Thank you so much for putting me on the right track. You rock.
Cancel
Vote up 0 Vote down

Cancel