This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 preload engine (PLE) error

Note: This was originally posted on 24th November 2011 at http://forums.arm.com

I have a user-mode Linux application running on a Cortex-A8 (a TI 8148 Davinci chip). I have a shared memory region that I'm using to communicate data back and forth between the ARM core and the TI c674x DSP. The shared memory region is a ring buffer made of 32k segments (the size of the 8148's L2 cache ways). I've locked down 3 of the L2 cache ways and I'm trying to use the L2 PLE (preload engine) - the L2 feature accessed through coprocessor 15 c11 - to asynchronously preload and writeback the ring buffer segments. The ring buffer itself is located in physically and virtually contiguous memory - we're using TI's cmem module to allocate out of a memory hole. Moreover, I've checked the linux struct page flags for the ring buffer pages and they seem to all be uniform and fairly kosher. Plain-vanilla loads and stores from the ring buffer work just fine, as do coprocessor 15 based cache writeback operations (performed in privileged mode, of course).

Anyways, everything goes quite nicely for a while (anywhere from 3 to 10 PLE transfers complete successfully), until a PLE transfer errors-out at a page boundary. It's a different page boundary (both virtual and physical address) each time, and it's a different number of ring buffer segments and a different number of pages into the ring buffer segment each time this happens. The error itself, from table 3-132 in the ARM Cortex-A8 Technical Reference Manual, is "b1000101", or "translation fault, section".

Does anyone know what this error means? At first I thought that maybe it was because the page was marked as uncached, but looking at the page properties (with /proc/kpageflags), that doesn't seem to be the case.

Edit: One more detail - this failure only happens with preload operations - not writebacks. Or at least I haven't seen it happen with a writeback yet.
Parents
  • Note: This was originally posted on 29th November 2011 at http://forums.arm.com

    [size="2"]
    With ARMv7 apparently the ASID is part of the TLB lookup - if the current contents of the global ContextID register (c13, c0) don't match the TLB ASID, then the TLB entry won't be a match.
    [/size]

    Yes the aim of the ASID is so that you don't have to flush the TLB on context switch. What I am unclear on is what happens when you get a TLB miss when the PLE is running. I assume it would perform a table walk using the current page tables, but populated with the ASID value out of the ContextID register. Which probably isn't what you wanted it to do (I guess you would want it to stop on an ASID mismatch for your usecase).


    Unfortunately, Linux seems to change the ASID whenever it rolls over to 0


    Yes, that's the other issue. If you have more than 255 processes active at the same time you will get ASID rollover, so it is time variant.

    I think Jerry is on the right lines here; the usual approach to exposing this type of hardware is to provide a device driver, so user-space allocates the memory via a kernel call to the driver, and performs special operations (start PLE transfer, for example) via a kernel call to the driver. This allows the kernel to have the memory mapped in it's address space, which solves the changing page-table problem, and you will need the kernel calls at the start and end of each PLE operation as you will need to issue appropriate L1 cache operations to ensure visibility of the data you've just shovelled into / want to shovel out of the L2.

    Cheers,
    Iso
Reply
  • Note: This was originally posted on 29th November 2011 at http://forums.arm.com

    [size="2"]
    With ARMv7 apparently the ASID is part of the TLB lookup - if the current contents of the global ContextID register (c13, c0) don't match the TLB ASID, then the TLB entry won't be a match.
    [/size]

    Yes the aim of the ASID is so that you don't have to flush the TLB on context switch. What I am unclear on is what happens when you get a TLB miss when the PLE is running. I assume it would perform a table walk using the current page tables, but populated with the ASID value out of the ContextID register. Which probably isn't what you wanted it to do (I guess you would want it to stop on an ASID mismatch for your usecase).


    Unfortunately, Linux seems to change the ASID whenever it rolls over to 0


    Yes, that's the other issue. If you have more than 255 processes active at the same time you will get ASID rollover, so it is time variant.

    I think Jerry is on the right lines here; the usual approach to exposing this type of hardware is to provide a device driver, so user-space allocates the memory via a kernel call to the driver, and performs special operations (start PLE transfer, for example) via a kernel call to the driver. This allows the kernel to have the memory mapped in it's address space, which solves the changing page-table problem, and you will need the kernel calls at the start and end of each PLE operation as you will need to issue appropriate L1 cache operations to ensure visibility of the data you've just shovelled into / want to shovel out of the L2.

    Cheers,
    Iso
Children
No data