I write the code as following to evaluate the expression n = n / 2
asrs r0, r0, #1
But, I found the GCC will translate the expression n = n / 2 into the following instruction
lsrs r1, r0, #31
adds r0, r1, r0
Why does it need to add the sign bit?
It looks like you're working with Cortex-M0.
On other ARM architectures, including Cortex-M3 and later, I think you can use the following code:
add r0,r0,r0,lsr#31 /* r0 = r0 + (r0 >> 31) */
asrs r0,r0,#1
Size will still be 6 bytes, but it should use one clock cycle less; eg. 2 clock cycles in total.