We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Our ARM Cortex M4 application, written in C++, needs to copy a 8 x 32-bit word struct to external memory, as fast as possible.
I found that a 'for' loop performed better than memcpy, but it's still slow.
Are there intrinsics using LDM/STM instructions, or an optimised version of memcpy, that we could use?
Would a 'placement new' for the destination, with a simple assignment of one struct to another, help?
We are using the armclang 6 compiler.
Calling memcpy produces good code if you ensure the pointers are 4-byte aligned, eg: godbolt.org/.../9nKq5Yc3E
A simple structure copy seems to do well: https://godbolt.org/z/sjcvhrafa
Thank you for your answers.