Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Non-Temporal Writes in SIMD Instruction set
Jump...
Cancel
Locked
Locked
Replies
4 replies
Subscribers
119 subscribers
Views
5990 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Non-Temporal Writes in SIMD Instruction set
Jeff Kirkham
over 12 years ago
Note: This was originally posted on 21st March 2011 at
http://forums.arm.com
X-86 platform supports what they term as non-temporal writes. This just means stores from the registers to memory that do not influence the cache. They are purported to run faster. Are there similar instructions for the NEON where we can speed up a simple memory copy by writing directly to memory and bypassing the cache?
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 23rd March 2011 at
http://forums.arm.com
but I have not been told why it is irrelevant.
Totally different memory system implementation in the two cores ...
For sequential reads from cache the Cortex-A9 implements an integrated preload engine which is transparent to the programmer. It should always be "one step ahead" of the memcpy without the need for the programmer to tickle the buffer being read from.
http://infocenter.ar...f/CHDFEFAH.html
I've seen benchmarks that show copies from cached to uncached-buffered memory (equivalent to your uncached write) are slower than cached to cached copies. However, it does probably depend on memory latencies, bandwidths, etc to some degree ...
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 23rd March 2011 at
http://forums.arm.com
but I have not been told why it is irrelevant.
Totally different memory system implementation in the two cores ...
For sequential reads from cache the Cortex-A9 implements an integrated preload engine which is transparent to the programmer. It should always be "one step ahead" of the memcpy without the need for the programmer to tickle the buffer being read from.
http://infocenter.ar...f/CHDFEFAH.html
I've seen benchmarks that show copies from cached to uncached-buffered memory (equivalent to your uncached write) are slower than cached to cached copies. However, it does probably depend on memory latencies, bandwidths, etc to some degree ...
Cancel
Vote up
0
Vote down
Cancel
Children
No data