This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Why MEMSET is not optimized?

I'm using ARM DS 2021 with compiler 6.16 and Cortex-A53 CPU in aarh64 mode.

I wrote small program with call to memset() function, set high optimization O2 (tried O3 and Omax also) and disassembly shows this:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
_memset
0x00008c14: b4000261 a... CBZ x1,0x8c60 ; _memset + 76
0x00008c18: 36000060 `..6 TBZ w0,#0,0x8c24 ; _memset + 16
0x00008c1c: 38001402 ...8 STRB w2,[x0],#1
0x00008c20: d1000421 !... SUB x1,x1,#1
0x00008c24: f1000828 (... SUBS x8,x1,#2
0x00008c28: 54000143 C..T B.CC 0x8c50 ; _memset + 60
0x00008c2c: 36080060 `..6 TBZ w0,#1,0x8c38 ; _memset + 36
0x00008c30: 78002402 .$.x STRH w2,[x0],#2
0x00008c34: aa0803e1 .... MOV x1,x8
0x00008c38: f100103f ?... CMP x1,#4
0x00008c3c: 540000a3 ...T B.CC 0x8c50 ; _memset + 60
0x00008c40: d1001021 !... SUB x1,x1,#4
0x00008c44: f1000c3f ?... CMP x1,#3
0x00008c48: b8004402 .D.. STR w2,[x0],#4
0x00008c4c: 54ffffa8 ...T B.HI 0x8c40 ; _memset + 44
0x00008c50: 36080041 A..6 TBZ w1,#1,0x8c58 ; _memset + 68
0x00008c54: 78002402 .$.x STRH w2,[x0],#2
0x00008c58: 36000041 A..6 TBZ w1,#0,0x8c60 ; _memset + 76
0x00008c5c: 39000002 ...9 STRB w2,[x0,#0]
0x00008c60: d65f03c0 .._. RET
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Linker chooses library c_ou.l

As we can see, the function barely optimized, using at max. 32-bit accesses on 64-bit CPU.

No NEON registers used.

Why this function is so bad?

I thought "highly optimized libraries" should look a way better :(

0