Why are there different encodings of instructions?
What's the idea/background/etc for their co-existence?
Can different encodings be mixed in the code? (Not ARM encodings with Thumb encodings- without ARM/Thumb mode change,
but, like A1 and A2 or T1 and T2)?
I'm trying to put together a gdb stub, and for single stepping the machine code needs to be partially decoded.
How can one tell apart the encoding of an instruction in a machine code program (binary)?
Oh, and an additional question: what do the bit values in parenthesis mean?
For the case when cond is 0b1111, see Unconditional instructions on page A5-216.
t = UInt(Rt); n = UInt(Rn); imm32 = Zeros(32); // Zero offset
if t == 15 || n == 15 then UNPREDICTABLE;
Encoding T1 ARMv6T2, ARMv7
LDREX<c> <Rt>, [<Rn>{, #<imm>}]
Encoding A1 ARMv6*, ARMv7
LDREX<c> <Rt>, [<Rn>]
1 1 0 1 0 0 0 0 1 0 1 Rn Rt (1) (1) (1) (1) imm8
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1
1514131211 10 9 8 7 6 5 4 3 2 1 0
cond 0 0 0 1 1 0 0 1 Rn Rt (1) (1) (1) (1) 1 0 0 1 (1) (1) (1) (1)
The truth is, which ever way you slice it, you've got a Magnum Opus on your hands here!
I've become to realize. Long laborous and frustrating task.
I did get gdb sources (gdb_7.9.0), but without some crossreferencer it's very hard to find stuff there.
The same goes with OpenOCD. Function pointers everywhere, and no clue where they are set.
[edit]
I found a pretty good example of how to do the decoding far enough to handle single stepping.
gdbserver sources: arm_tdep.c, rm_get_next_pc_raw() and thumb_get_next_pc_raw().
The decoding is really done with style! The experience with ARM instruction sets shows.
I think I'll go with it and return to the full instruction list later.