This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

correct storring Cortex-A8 NEON registers in memory

  • Note: This was originally posted on 23rd September 2011 at http://forums.arm.com

    Passing data between ARM and NEON
    Using MRC instructions to pass data from NEON to ARM takes a minimum of 20 cycles. The data transfers from the NEON register file at the back of the NEON pipeline to the ARM register file at the beginning of the ARM pipeline. You can hide some or all of this latency by doing multiple back-to-back MRC transfers. The processor continues to issue instructions following a MRC until it encounters an instruction that must read or write the ARM register file. At that point, instruction issue stalls until all pending register transfers from NEON to ARM are complete.

    how to implement it in code?
  • Note: This was originally posted on 23rd September 2011 at http://forums.arm.com

    Just like you are loading your data using "vld1q_f32", can't you store your data using "vst1q_f32"? It might not make a speedup but its definitely worth trying.