This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 preload engine (PLE) error

Note: This was originally posted on 24th November 2011 at http://forums.arm.com

I have a user-mode Linux application running on a Cortex-A8 (a TI 8148 Davinci chip). I have a shared memory region that I'm using to communicate data back and forth between the ARM core and the TI c674x DSP. The shared memory region is a ring buffer made of 32k segments (the size of the 8148's L2 cache ways). I've locked down 3 of the L2 cache ways and I'm trying to use the L2 PLE (preload engine) - the L2 feature accessed through coprocessor 15 c11 - to asynchronously preload and writeback the ring buffer segments. The ring buffer itself is located in physically and virtually contiguous memory - we're using TI's cmem module to allocate out of a memory hole. Moreover, I've checked the linux struct page flags for the ring buffer pages and they seem to all be uniform and fairly kosher. Plain-vanilla loads and stores from the ring buffer work just fine, as do coprocessor 15 based cache writeback operations (performed in privileged mode, of course).

Anyways, everything goes quite nicely for a while (anywhere from 3 to 10 PLE transfers complete successfully), until a PLE transfer errors-out at a page boundary. It's a different page boundary (both virtual and physical address) each time, and it's a different number of ring buffer segments and a different number of pages into the ring buffer segment each time this happens. The error itself, from table 3-132 in the ARM Cortex-A8 Technical Reference Manual, is "b1000101", or "translation fault, section".

Does anyone know what this error means? At first I thought that maybe it was because the page was marked as uncached, but looking at the page properties (with /proc/kpageflags), that doesn't seem to be the case.

Edit: One more detail - this failure only happens with preload operations - not writebacks. Or at least I haven't seen it happen with a writeback yet.
Parents
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com



    Yes, corruption would certainly result if the VA->PA translation changed to something else and the PLE was still running.



    Is suspect the answer is "software" =)

    I'm not a PLE expert, but AFAICT the PLE uses the same page tables as currently mapped on the core, so if the OS context switches from one process to another you either have to (1) stall the context switch waiting for the pending PLE reqeusts to complete, or (2) cancel pending PLE requests, (3) "pause" the transfer, switch the process out, and "resume" when it gets switched back in again.

    Cheers,
    Iso


    I wonder - I've played around with this a bit and it seems that the PLE ContextID register might be the key here. I suspect that the ASID field in that register needs to match the ASID field in any TLB entries used by the PLE to do it's address translation. With ARMv7 apparently the ASID is part of the TLB lookup - if the current contents of the global ContextID register (c13, c0) don't match the TLB ASID, then the TLB entry won't be a match. It seems like maybe the PLE ContextID register (c11, c15) might serve a similar purpose for these asynchronous PLE transfers.

    Unfortunately, Linux seems to change the ASID whenever it rolls over to 0 (it's an 8-bit counter) - so I'm not sure that I could guarantee that my process's ASID is always going to be the same? If not, I'd have to set the PLE ContextID ASID often enough to do reliable transfers - and the PLE ContextID register is only accessible in kernel-mode. One of the big reasons I'm trying to use the PLE in the first place is to avoid an expensive syscall when writing back memory - it's fairly expensive on this platform (about 8000 cycles for a binary sysfs attribute access, and more for an ioctl or a character sysfs attribute access).


    The real problem that I'm having now seems to be writing the L1 cache back - I've figured out that most (all?) of the corruption I'm seeing now is due to writing the L2 cache back with the PLE but not the L1 cache.
Reply
  • Note: This was originally posted on 28th November 2011 at http://forums.arm.com



    Yes, corruption would certainly result if the VA->PA translation changed to something else and the PLE was still running.



    Is suspect the answer is "software" =)

    I'm not a PLE expert, but AFAICT the PLE uses the same page tables as currently mapped on the core, so if the OS context switches from one process to another you either have to (1) stall the context switch waiting for the pending PLE reqeusts to complete, or (2) cancel pending PLE requests, (3) "pause" the transfer, switch the process out, and "resume" when it gets switched back in again.

    Cheers,
    Iso


    I wonder - I've played around with this a bit and it seems that the PLE ContextID register might be the key here. I suspect that the ASID field in that register needs to match the ASID field in any TLB entries used by the PLE to do it's address translation. With ARMv7 apparently the ASID is part of the TLB lookup - if the current contents of the global ContextID register (c13, c0) don't match the TLB ASID, then the TLB entry won't be a match. It seems like maybe the PLE ContextID register (c11, c15) might serve a similar purpose for these asynchronous PLE transfers.

    Unfortunately, Linux seems to change the ASID whenever it rolls over to 0 (it's an 8-bit counter) - so I'm not sure that I could guarantee that my process's ASID is always going to be the same? If not, I'd have to set the PLE ContextID ASID often enough to do reliable transfers - and the PLE ContextID register is only accessible in kernel-mode. One of the big reasons I'm trying to use the PLE in the first place is to avoid an expensive syscall when writing back memory - it's fairly expensive on this platform (about 8000 cycles for a binary sysfs attribute access, and more for an ioctl or a character sysfs attribute access).


    The real problem that I'm having now seems to be writing the L1 cache back - I've figured out that most (all?) of the corruption I'm seeing now is due to writing the L2 cache back with the PLE but not the L1 cache.
Children
No data