This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

armclang and memcpy: Weird behavior

42Bastian Schick over 3 years ago

Just for info:

armclang optimizes memcpy() in a weird way (seen for ARMv7e-M), not respecting the order of access.

So a memcpy(RBAR, mputable,...) failes :(

Top replies

42Bastian Schick over 3 years ago +1 verified

=> use for() loop instead :(
a.surati over 3 years ago in reply to 42Bastian Schick +1

Following your hint, a similar search found the Linux kernel documentation about its IO access APIs. It mentions "Do not use memset or memcpy on IO addresses; they are not guaranteed to copy data in order...

Parents

0 42Bastian Schick over 3 years ago in reply to Ronan Synnott

ARM clang 6.6.3 _and_ 6.16.1

 memcpy((void *)&(MPU->RBAR),&boot_pt[0],sizeof(uint32_t)*8);

______addr/line|code_____|label____|mnemonic________________|comment
            176|  memcpy((void *)&(MPU->RBAR),&boot_pt[0],sizeof(uint32_t)*8);
    ST:08001D22|F24A4048            movw    r0,#0xA448
    ST:08001D26|F64E55B0            movw    r5,#0xEDB0
    ST:08001D2A|F6C00000            movt    r0,#0x800
    ST:08001D2E|F2CE0500            movt    r5,#0xE000
    ST:08001D32|E9D0CE00            ldrd    r12,r14,[r0]
    ST:08001D36|E9D03102            ldrd    r3,r1,[r0,#0x8]
    ST:08001D3A|E9D02404            ldrd    r2,r4,[r0,#0x10]
    ST:08001D3E|602C                str     r4,[r5]
    ST:08001D40|F64E54AC            movw    r4,#0xEDAC
    ST:08001D44|F2CE0400            movt    r4,#0xE000
    ST:08001D48|6022                str     r2,[r4]

What happens is, that first it store to RASR (EDB0) then to RABR (EDAC), which is wrong.

Reply

0 42Bastian Schick over 3 years ago in reply to Ronan Synnott

ARM clang 6.6.3 _and_ 6.16.1

 memcpy((void *)&(MPU->RBAR),&boot_pt[0],sizeof(uint32_t)*8);

______addr/line|code_____|label____|mnemonic________________|comment
            176|  memcpy((void *)&(MPU->RBAR),&boot_pt[0],sizeof(uint32_t)*8);
    ST:08001D22|F24A4048            movw    r0,#0xA448
    ST:08001D26|F64E55B0            movw    r5,#0xEDB0
    ST:08001D2A|F6C00000            movt    r0,#0x800
    ST:08001D2E|F2CE0500            movt    r5,#0xE000
    ST:08001D32|E9D0CE00            ldrd    r12,r14,[r0]
    ST:08001D36|E9D03102            ldrd    r3,r1,[r0,#0x8]
    ST:08001D3A|E9D02404            ldrd    r2,r4,[r0,#0x10]
    ST:08001D3E|602C                str     r4,[r5]
    ST:08001D40|F64E54AC            movw    r4,#0xEDAC
    ST:08001D44|F2CE0400            movt    r4,#0xE000
    ST:08001D48|6022                str     r2,[r4]

What happens is, that first it store to RASR (EDB0) then to RABR (EDAC), which is wrong.

Children

0 a.surati over 3 years ago in reply to 42Bastian Schick

Does any memcpy implementation care about the order in which it copies the data? On x64, for e.g., an implementation may enlist the help of xmm registers for copying in large chunks. The function has to correctly deal with overlapping src and dst areas, but it isn't usually expected to deal directly with memory ordering. It does not know, just by looking at the src and dst address, the order in which it must copy. I think the assumption is that the function will be called to work on memory marked as Normal.
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 3 years ago in reply to a.surati

I would not be surprised if I had seen this before, but the ARM clang memcpy does the weirdest accesses I have seen so far. In the last 30+ years, I have never seen a memcpy which did not do it with increasing addresses. But "always been so" is of course no rule ;-)
Cancel
Up 0 Down

Cancel
0 Ronan Synnott over 3 years ago in reply to 42Bastian Schick

Hi again Bastian,

I discussed this issue with some colleagues.

Rather than memcpy(), you may want to make use of CMSIS functions such as ARM_MPU_Load()
https://arm-software.github.io/CMSIS_5/Core/html/group__mpu__functions.html#gafa27b26d5847fa8e465584e376b6078a

They also suggested the 'Optimizing the MPU programming' section of the below blog:
https://blog.feabhas.com/2013/02/setting-up-the-cortex-m34-armv7-m-memory-protection-unit-mpu/

Ronan
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 3 years ago in reply to Ronan Synnott

Roman, the problem is not the MPU programming. But the strange order of memory accesses of the memcpy. Any device which wants a linear order of accesses would fail. I am using memcpy on a lot of different platforms with a multitude of compilers, and never saw a behavior like this. But maybe the C standard allows it, I do not know.
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 3 years ago in reply to Ronan Synnott

Ronan Synnott said:
They also suggested the 'Optimizing the MPU programming' section of the below blog:

He uses memcpy(). The reason for aliasing rabr and rasr was to allow memcpy to quickly program up to four regions.
Cancel
Up 0 Down

Cancel
0 42Bastian Schick over 3 years ago in reply to a.surati

I searched the web and found a single instance (Linux driver tutorial) where it was mentioned that memcpy() may not copy in order.
Cancel
Up 0 Down

Cancel
0 a.surati over 3 years ago in reply to 42Bastian Schick

Following your hint, a similar search found the Linux kernel documentation about its IO access APIs. It mentions "Do not use memset or memcpy on IO addresses; they are not guaranteed to copy data in order."

The doc suggests using memcpy_toio and memcpy_fromio to copy data between IO memory and the RAM. Although these versions copy data in order, their knowledge about specific IO access patterns is limited - they only attempt aligned (alignment of the IO address) transfers, by copying in small chunks around the largest possible (4 bytes max on x86, it seems) alignment.

The Linux source for memset_io comments: "memset can mangle the IO patterns quite a bit."
Cancel
Up +1 Down

Cancel