I need a little digital io for my project since cortex-m is not economical for me I need to know can i use Cortex-a to do some embedded jobs?
Another not very obvious difference in assembler conversion is the shift operations. On x86 shifts amounts are modulo the register size so <<32 does nothing. On ARM it would zero the register - but it is module 256 which is strange. On the ARM 64 bit architecture it does the same thing as x86. This can catch you out if you write a routine to extract bits from a variable bit position or to rotate a value for instance. The C standard says shifting by the size of the register or more is undefined, but when people write their code they sometimes assume their code is following the standard so this becomes a nasty gotcha.
https://david.wragg.org/blog/2012/11/shift-instructions.html
This is why you might sometime see code like ((w0 >> 1) >> ( 31-sh)) | (w1 << sh)
where sh must be within 0..31. For rotate there's an intrinsic or code on the web.