Cortex A9 MP MMU

I’m doing a bare metal project using an A9MP processor ( NXP/Freescale iMX6Q), and in the process of setting up the MMU. The project will be using 2 (of the 4) cores. Core 0 will be reading data in a common shareable data area in OCRAM and displaying the data on an LCD display. Core 1 is gathering the data and inserting it in the common area.  Reads/writes from/to the common area are protected using LDREX/STREX operations. The common data is set up as STRONGLY_ORDERED, no execute (I assume this is correct).

 I have a couple of questions:

  1. From reading in the ARM forum, I saw a suggestion that if using more than 1 core, the SMP bit should always be set, even if there is no interaction between the two working cores – is this correct? If the SMP bit is set, should the FW bit also be set – any downside to setting it?
  2. Pros and cons of setting the L1 Dcache prefetch bit
  3. Pros and cons of setting the Alloc in one way bit. Core 0 will be copying one frame buffer into another, so lots of huge (1.5MB) copies. Not sure yet whether to use memory to memory DMA or NEON copy – any suggestions? Is the bit to be enabled only during copy operations or left enabled all of the time?
  4. I noticed in one place that, for this processor, a user did a setup for STRONGLY_ORDERED memory and also set the shareable and RW access bits. I thought all STRONGLY_ORDERED memory (by default) is shareable with RW access. Which is correct?
  5. I plan on using write back (rather than write through) for DRAM memory (with code memory being RO, data memory RW nX) – is this best? We did find out early on that the frame buffer memory works much faster if it’s not cached.

Any other cp bits I should know about?