Just for info:
armclang optimizes memcpy() in a weird way (seen for ARMv7e-M), not respecting the order of access.
So a memcpy(RBAR, mputable,...) failes :(
=> use for() loop instead :(
Hi Bastian, not sure if your reply was a solution or a workaround. If you would like us to have a look at this, could you provide an example?
Oops, overlooked your reply. A simple memcpy() from memory to rbar should show it. But will try latest 6.16 and check out if it still fails.
ARM clang 6.6.3 _and_ 6.16.1
176| memcpy((void *)&(MPU->RBAR),&boot_pt,sizeof(uint32_t)*8);
ST:08001D22|F24A4048 movw r0,#0xA448
ST:08001D26|F64E55B0 movw r5,#0xEDB0
ST:08001D2A|F6C00000 movt r0,#0x800
ST:08001D2E|F2CE0500 movt r5,#0xE000
ST:08001D32|E9D0CE00 ldrd r12,r14,[r0]
ST:08001D36|E9D03102 ldrd r3,r1,[r0,#0x8]
ST:08001D3A|E9D02404 ldrd r2,r4,[r0,#0x10]
ST:08001D3E|602C str r4,[r5]
ST:08001D40|F64E54AC movw r4,#0xEDAC
ST:08001D44|F2CE0400 movt r4,#0xE000
ST:08001D48|6022 str r2,[r4]
What happens is, that first it store to RASR (EDB0) then to RABR (EDAC), which is wrong.
Does any memcpy implementation care about the order in which it copies the data? On x64, for e.g., an implementation may enlist the help of xmm registers for copying in large chunks. The function has to correctly deal with overlapping src and dst areas, but it isn't usually expected to deal directly with memory ordering. It does not know, just by looking at the src and dst address, the order in which it must copy. I think the assumption is that the function will be called to work on memory marked as Normal.
I would not be surprised if I had seen this before, but the ARM clang memcpy does the weirdest accesses I have seen so far. In the last 30+ years, I have never seen a memcpy which did not do it with increasing addresses. But "always been so" is of course no rule ;-)
Hi again Bastian,
I discussed this issue with some colleagues.
Rather than memcpy(), you may want to make use of CMSIS functions such as ARM_MPU_Load()https://arm-software.github.io/CMSIS_5/Core/html/group__mpu__functions.html#gafa27b26d5847fa8e465584e376b6078a
They also suggested the 'Optimizing the MPU programming' section of the below blog:https://blog.feabhas.com/2013/02/setting-up-the-cortex-m34-armv7-m-memory-protection-unit-mpu/
Roman, the problem is not the MPU programming. But the strange order of memory accesses of the memcpy. Any device which wants a linear order of accesses would fail. I am using memcpy on a lot of different platforms with a multitude of compilers, and never saw a behavior like this. But maybe the C standard allows it, I do not know.
Ronan Synnott said:They also suggested the 'Optimizing the MPU programming' section of the below blog:
He uses memcpy(). The reason for aliasing rabr and rasr was to allow memcpy to quickly program up to four regions.
I searched the web and found a single instance (Linux driver tutorial) where it was mentioned that memcpy() may not copy in order.
Following your hint, a similar search found the Linux kernel documentation about its IO access APIs. It mentions "Do not use memset or memcpy on IO addresses; they are not guaranteed to copy data in order."
The doc suggests using memcpy_toio and memcpy_fromio to copy data between IO memory and the RAM. Although these versions copy data in order, their knowledge about specific IO access patterns is limited - they only attempt aligned (alignment of the IO address) transfers, by copying in small chunks around the largest possible (4 bytes max on x86, it seems) alignment.
The Linux source for memset_io comments: "memset can mangle the IO patterns quite a bit."