Why the different encodings?

Why are there different encodings of instructions?

What's the idea/background/etc for their co-existence?

Can different encodings be mixed in the code? (Not ARM encodings with Thumb encodings- without ARM/Thumb mode change,

but, like A1 and A2 or T1 and T2)?

I'm trying to put together a gdb stub, and for single stepping the machine code needs to be partially decoded.

How can one tell apart the encoding of an instruction in a machine code program (binary)?

Oh, and an additional question: what do the bit values in parenthesis mean?

For the case when cond is 0b1111, see Unconditional instructions on page A5-216.

t = UInt(Rt); n = UInt(Rn); imm32 = Zeros(32); // Zero offset

if t == 15 || n == 15 then UNPREDICTABLE;

Encoding T1 ARMv6T2, ARMv7

LDREX<c> <Rt>, [<Rn>{, #<imm>}]

Encoding A1 ARMv6*, ARMv7

LDREX<c> <Rt>, [<Rn>]

1 1 0 1 0 0 0 0 1 0 1 Rn Rt (1) (1) (1) (1) imm8

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

1

1514131211 10 9 8 7 6 5 4 3 2 1 0

cond 0 0 0 1 1 0 0 1 Rn Rt (1) (1) (1) (1) 1 0 0 1 (1) (1) (1) (1)

Parents
  • Hello,


    the difference encoding of the same instructions of which one is ARM and another is Thumb would come from the encoding policy.
    ARM instruction is defined by bit 27-25 and bit 4 of the instruction word.
    Thumb instruction is defined by bit 15-10 of the instruction half-word.
    Basically ARM instruction is 32 bit length and Thumb instruction is 16 bit length.
    Because Thumb is made after ARM and the basic instruction length is different, it would be impossible to take the same encoding.
    In the elf file, binary codes can vary the encoding whether ARM or Thumb for every function.
    However, I don't know how the elf file contains the function attributes.

    Regarding '(n)' notation, the bit of which value is other than 'n' shows that (in general) the instruction is undefined.
    According to the A5.1.2 of the ARM Architecture Reference Manual, the following statements exist.

    An instruction is UNPREDICTABLE if:
    • it is declared as UNPREDICTABLE in an instruction description or in this chapter
    • the pseudocode for that encoding does not indicate that a different special case applies, and a bit marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively.

    From this, the dis-assembler MIGHT ignore the bit which has parenthesises.
    The result of UNDEFINED is one reported by objdump command.

    Best regards,
    Yasuhiko Koumoto.

Reply
  • Hello,


    the difference encoding of the same instructions of which one is ARM and another is Thumb would come from the encoding policy.
    ARM instruction is defined by bit 27-25 and bit 4 of the instruction word.
    Thumb instruction is defined by bit 15-10 of the instruction half-word.
    Basically ARM instruction is 32 bit length and Thumb instruction is 16 bit length.
    Because Thumb is made after ARM and the basic instruction length is different, it would be impossible to take the same encoding.
    In the elf file, binary codes can vary the encoding whether ARM or Thumb for every function.
    However, I don't know how the elf file contains the function attributes.

    Regarding '(n)' notation, the bit of which value is other than 'n' shows that (in general) the instruction is undefined.
    According to the A5.1.2 of the ARM Architecture Reference Manual, the following statements exist.

    An instruction is UNPREDICTABLE if:
    • it is declared as UNPREDICTABLE in an instruction description or in this chapter
    • the pseudocode for that encoding does not indicate that a different special case applies, and a bit marked (0) or (1) in the encoding diagram of an instruction is not 0 or 1 respectively.

    From this, the dis-assembler MIGHT ignore the bit which has parenthesises.
    The result of UNDEFINED is one reported by objdump command.

    Best regards,
    Yasuhiko Koumoto.

Children
More questions in this forum