ARM instruction set pseudo instructions

Does anyone know if there is a list of ARM instruction set pseudo instructions?

Or better yet, an instruction list like PPC's, where there is a list of 'true instructions' with mnemonics and

another list of "simplified mnemonics" (=pseudo instructions) in terms of the "true instructions" (mnemonics).

There are 499 ARM instructions listed in ARMv7-A/R ARM and going through them one by one is quite a job.

The "true" PPC instructions are explained much like ARM instructions in ARMv7-A/R ARM, but in the

"simplified mnemonics" chapter the pseudo instructions are described like:

Simplified Mnemonic    MnemonicInstruction
bctr                         bcctr 20,0Branch unconditionally (bcctr without LR update)
bctrlbcctrl 20,0Branch unconditionally (bcctrl with LR Update)
...
mr rA,rSor rA,rS,rSMove register

How many pseudo instructions (roughly) are there for Cortex-A7?

ARMv7-A/R ARM doesn't seem to make a difference.

The basic LDR/STR instructions (bits 27 - 25 = 0 1 0 or 0 1 1) are pseudo instructions and there is really

two basic "single data load/store" instructions: the immediate form and the register form.

The special LDR/STR instructions (bits 27 - 25 = 0 0 0) (LDRH, STRD, ...) are different instructions:

the instruction bits have different meanings and/or are in different places.

Also some PC-related instructions are "true", because unlike with other registers, using PC also loads

the CPSR, so it's functionally different even if the encoding bits are exactly the same.

  • Hello,


    as far as I konw, I think the true pseudo instruction is only "LDR  Rr,=immediate".

    It is converted to "LDR  Rt,[pc,#imm] (i.e. a load from a literal pool)" or "MOV  Rt, #imm" according to the immediate size.

    Regarding others, almost all pseudo instructions are listed in the ARM ARM.


    For example:

      ADR Rt,<label>     ==>  ADD Rt, PC, #imm

      PUSH   Rt             ==>  STR  Rt, [SP,#-4]!

      POP      Rt            ==>   LDR  Rt,[SP],#4

       PUSH {Ra-Rb}    ==>    STMDB  SP!, {Ra-Rb}

       POP   {Ra-Rb}     ==>    LDMIA   SP!, {Ra-Rb}

    Best regards,

    Yasuhiko Koumoto.

  • In addition to those:

    LSL{S}<c> <Rd>, <Rm>, #0  ==> MOV{S}<c> <Rd>, <Rm>

    LDM/LDMIA/LDMFD/LDMDB/LDMEA/LDMIB/LDMED and the STM-equivalents are all pseudo

    instructions on top of anonymous instruction similarly to the basic LDR/STR instructions.

    I consider them pseudo instructions, because the different mnemonics only define certain values

    for the instruction modifier bits.

    c c c c 1 0 0 B I M W L n n n n r r r r r r r r r r r r r r r r   ?M<c> <Rn>{!}, <registers>

    where B= before/after (1=before), I = increment/decrement (1=increment)

    M= current mode/user mode (0=current), W=writeback, L=load/store(1=load)

    So LDMIA<c> <Rn>! <registers>  ==>  "?M" where L=1, I=1, B=0, M=0 and W=1.

    All (or at least nearly all) the other instruction encoding overlaps are actually "true" instructions, not pseudos.

    Like ROR{S}<c> <Rd>, <Rm>, #0 --> RRX{S}<c> <Rd> (almost but not quite similar - 1 shift, sign extend)

    and MSR{<c>}{<q>} <spec_reg>, #<imm> that becomes a hint (defined by the imm) if spec_reg bits are all zeros.

    Yes, I think, you're right. There are not very many pseudos, but "true" overlapping instructions.

    (Same encoding, but specific operand value, and the instruction changes into another instruction.)

    Darn, the instruction set is (made?) complicated.

  • Thanks, I copied that into a text file for later use. It will be handy when I get to the thumb instruction set.

    Well, if I live old enough to get to the thumb instruction set.

    On Cortex-M3 and Cortex-M4, I have a feeling that UXTB, UXTH, SXTB and SXTH are simplified versions of UBFX and SBFX, but I have not checked the disassembly.

    At least in ARM instruction set they look like different instructions:

    c c c c 0 1 1 1 1 1 1 w w w w w d d d d b b b  b b 1 0 1 n n n n UBFX<c> <Rd>, <Rn>, #<lsb>, #<width>

    c c c c 0 1 1 0 1 1 1 0 1 1 1 1 d d d d r r(0)(0)0 1 1 1 m m m m UXTB<c> <Rd>, <Rm>{, <rotation>}

    c c c c 0 1 1 0 1 1 1 1 1 1 1 1 d d d d r r(0)(0)0 1 1 1 m m m m UXTH<c> <Rd>, <Rm>{, <rotation>}

    Here w w w w w is width, b b b b b is lsb-bit place

    rr is rotate count

  • I'm not able to give a list, but I might be able to help out on the 16-bit thumb. Even the 16-bit thumb has pseudo instructions and simplified instructions.

    pseudo-instructions for thumb include the ldr=, which Yasuhiko Koumoto mentioned already.

    I think what you're after is simplified instructions, where pseudo-instructions are defined to be more like a "macro". Thus MOV32 is a pseudo-instruction (eg. a macro) which (always) produces two 32-bit instructions: MOVW + MOVT (resulting in 8 bytes of code). This is (almost) similar to the LIL pseudo-instruction on PPC (LI+ADDIS).

    That said, there is a pseudo-instruction, which you'd be interested in: The UND instruction, which generates an UNDEFINED instruction. It takes a single parameter, which is an expression.

         UND    #0...65535                         /* A32 instruction set */

         UND    #0...4095                          /* T2 (32-bit thumb) instruction set */

         UND    #0...255                            /* T1 (16-bit thumb) instruction set */

    The simplified instructions I know of on the 16-bit thumb are NOP, NOT and NEG. As far as I know, NOP is a simplified MOV R8,R8.

    (any MOV Rn,Rn would act as a NOP, but I believe R8 was chosen, because this register is not used very often, and thus if the code runs on newer architectures, the risk of register-clashing / RAW/WAR would be minimized, in case it would mean anything on such architectures).

    NOT is a MVN Rn,Rn instruction.

    NEG is RSB Rn,#0 (the immediate value on 16-bit thumb is restricted to only #0, nothing else)

    On 16-bit thumb, LSRS, LSLS, ASRS, ASLS, RORS and RRXS are not simplified instructions. This is because there's not room for supporting "operand2" on the thumb.

    But on thumb2, LSRS, LSLS, ASRS, ASLS, RORS and RRXS are actually MOVS instructions with the second operand specifying the shift type and count.

    LSL, LSR, ASL, ASR, ROR and RRX similarly usese MOV.

    Also, PUSH and POP on 16-bit thumb are special; since they do not act exactly like STMIA/LDMIA.

    Note: TST is not a simplified ANDS instruction, because it does not modify any register like ANDS would.

    On Cortex-M3 and Cortex-M4, I have a feeling that UXTB, UXTH, SXTB and SXTH are simplified versions of UBFX and SBFX, but I have not checked the disassembly.

  • Also, the ARMv7-A/R ARM has funny way of describing the encodings:

    313029282726252423222120191817161514131211109876543210
    1111001i1D000xxxddddMMMM0Q01xxxxVORR<c>.<dt> <Qd>, #<imm>T1/A1A8.8.359
    1111001i1D000xxxddddMMMM0Qf1xxxxVMOV<c>.<dt> <Qd>, #<imm>T1/A1A8.8.339
    1111001U1Dxxxxxxdddd0000LQM1mmmmVSHR<c>.<type><size> <Qd>, <Qm>, #<imm>T1/A1A8.8.398

    Under VORR it says:

    if cmode<0> == ‘0’ || cmode<3:2> == ‘11’ then SEE VMOV (immediate);

    (cmode = M M M M)

    Under VMOV:

    if op == ‘0’ && cmode<0> == ‘1’ && cmode<3:2> != ‘11’ then SEE VORR (immediate);

    if op == ‘1’ && cmode != ‘1110’ then SEE “Related encodings”;

    (op = f, cmode= M M M M)

    Under VSHR:

    if (L:imm6) IN “0000xxx” then SEE “Related encodings”;

    (imm = x x x x x x)

    The table-method can't tell these apart, so these rules are needed.

    (There are no further info what the referenced "Related encodings" could be.)

  • Funny how things look more clear after a good sleep.

    It looks like I'm getting forward with the coding (at last).

    The state-related restrictions of MSR/MRS instructions are horrible, though.

    They generate a lot of checking code.