I'm using ARM DS 2021 with compiler 6.16 and Cortex-A53 CPU in aarh64 mode.
I wrote small program with call to memset() function, set high optimization O2 (tried O3 and Omax also) and disassembly shows this:
_memset 0x00008c14: b4000261 a... CBZ x1,0x8c60 ; _memset + 76 0x00008c18: 36000060 `..6 TBZ w0,#0,0x8c24 ; _memset + 16 0x00008c1c: 38001402 ...8 STRB w2,[x0],#1 0x00008c20: d1000421 !... SUB x1,x1,#1 0x00008c24: f1000828 (... SUBS x8,x1,#2 0x00008c28: 54000143 C..T B.CC 0x8c50 ; _memset + 60 0x00008c2c: 36080060 `..6 TBZ w0,#1,0x8c38 ; _memset + 36 0x00008c30: 78002402 .$.x STRH w2,[x0],#2 0x00008c34: aa0803e1 .... MOV x1,x8 0x00008c38: f100103f ?... CMP x1,#4 0x00008c3c: 540000a3 ...T B.CC 0x8c50 ; _memset + 60 0x00008c40: d1001021 !... SUB x1,x1,#4 0x00008c44: f1000c3f ?... CMP x1,#3 0x00008c48: b8004402 .D.. STR w2,[x0],#4 0x00008c4c: 54ffffa8 ...T B.HI 0x8c40 ; _memset + 44 0x00008c50: 36080041 A..6 TBZ w1,#1,0x8c58 ; _memset + 68 0x00008c54: 78002402 .$.x STRH w2,[x0],#2 0x00008c58: 36000041 A..6 TBZ w1,#0,0x8c60 ; _memset + 76 0x00008c5c: 39000002 ...9 STRB w2,[x0,#0] 0x00008c60: d65f03c0 .._. RET __aeabi_memclr4 __aeabi_memclr8 __rt_memclr_w 0x00008c64: f100103f ?... CMP x1,#4 0x00008c68: 540000a3 ...T B.CC 0x8c7c ; __aeabi_memclr4 + 24 0x00008c6c: d1001021 !... SUB x1,x1,#4 0x00008c70: f1000c3f ?... CMP x1,#3 0x00008c74: b800441f .D.. STR wzr,[x0],#4 0x00008c78: 54ffffa8 ...T B.HI 0x8c6c ; __aeabi_memclr4 + 8 0x00008c7c: 37080061 a..7 TBNZ w1,#1,0x8c88 ; __aeabi_memclr4 + 36 0x00008c80: 37000081 ...7 TBNZ w1,#0,0x8c90 ; __aeabi_memclr4 + 44 0x00008c84: d65f03c0 .._. RET 0x00008c88: 7800241f .$.x STRH wzr,[x0],#2 0x00008c8c: 3607ffc1 ...6 TBZ w1,#0,0x8c84 ; __aeabi_memclr4 + 32 0x00008c90: 3900001f ...9 STRB wzr,[x0,#0] 0x00008c94: d65f03c0 .._. RET _memset_w 0x00008c98: f100103f ?... CMP x1,#4 0x00008c9c: 540000a3 ...T B.CC 0x8cb0 ; _memset_w + 24 0x00008ca0: d1001021 !... SUB x1,x1,#4 0x00008ca4: f1000c3f ?... CMP x1,#3 0x00008ca8: b8004402 .D.. STR w2,[x0],#4 0x00008cac: 54ffffa8 ...T B.HI 0x8ca0 ; _memset_w + 8 0x00008cb0: 37080061 a..7 TBNZ w1,#1,0x8cbc ; _memset_w + 36 0x00008cb4: 37000081 ...7 TBNZ w1,#0,0x8cc4 ; _memset_w + 44 0x00008cb8: d65f03c0 .._. RET 0x00008cbc: 78002402 .$.x STRH w2,[x0],#2 0x00008cc0: 3607ffc1 ...6 TBZ w1,#0,0x8cb8 ; _memset_w + 32 0x00008cc4: 39000002 ...9 STRB w2,[x0,#0] 0x00008cc8: d65f03c0 .._. RET
Linker chooses library c_ou.l
As we can see, the function barely optimized, using at max. 32-bit accesses on 64-bit CPU.
No NEON registers used.
Why this function is so bad?
I thought "highly optimized libraries" should look a way better :(
Hello, Stephen.
From my experience with the ARM compiler v5, I used to take it as best in class, expecting that the sixth version would be no less in code quality.
But my expectations were a little too high, I'm sorry.
Thanks a lot for the routines link, these seems great and will be very helpful to me!
Regards,
Vlad
Note that the compiler and optimization settings of your project are essentially irrelevant when it comes to library functions like memset(); that's all library code that is pre-built and provided as .a files.