This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Non-Temporal Writes in SIMD Instruction set

Note: This was originally posted on 21st March 2011 at http://forums.arm.com

X-86 platform supports what they term as non-temporal writes. This just means stores from the registers to memory that do not influence the cache. They are purported to run faster. Are there similar instructions for the NEON where we can speed up a simple memory copy by writing directly to memory and bypassing the cache?
Parents
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com


    but I have not been told why it is irrelevant.


    Totally different memory system implementation in the two cores ...

    For sequential reads from cache the Cortex-A9 implements an integrated preload engine which is transparent to the programmer. It should always be "one step ahead" of the memcpy without the need for the programmer to tickle the buffer being read from. http://infocenter.ar...f/CHDFEFAH.html

    I've seen benchmarks that show copies from cached to uncached-buffered memory (equivalent to your uncached write) are slower than cached to cached copies. However, it does probably depend on memory latencies, bandwidths, etc to some degree ...
Reply
  • Note: This was originally posted on 23rd March 2011 at http://forums.arm.com


    but I have not been told why it is irrelevant.


    Totally different memory system implementation in the two cores ...

    For sequential reads from cache the Cortex-A9 implements an integrated preload engine which is transparent to the programmer. It should always be "one step ahead" of the memcpy without the need for the programmer to tickle the buffer being read from. http://infocenter.ar...f/CHDFEFAH.html

    I've seen benchmarks that show copies from cached to uncached-buffered memory (equivalent to your uncached write) are slower than cached to cached copies. However, it does probably depend on memory latencies, bandwidths, etc to some degree ...
Children
No data