ARM/THUMB instructions that change execution path?

Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?

I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find

the next instruction(s) where the next breakpoint instruction needs to be set.

There are three cases:

1) current instruction doesn't change the execution path. Next instruction is the next word.

2) current instruction is a jump. The operand defines the next instruction address

3) current instruction is conditional branch. One possible next instruction is the next word, the other possible

instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).

To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.

I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,

or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.

Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.

Parents
  • Unfortunately it's not so simple - this kind of things will mess it up causing "false positives":

    cccc00010B00nnnntttt(0)(0)(0)(0)1001TTTTSWP{B}<c>_<Rt>,_<Rt2>,_[<Rn>]A1A8.8.229
    cccc00010R00(1)(1)(1)(1)dddd(0)(0)0(0)0000(0)(0)(0)(0)MRS<c>_<Rd>,_<spec_reg>A1B9.3.8
    cccc00010R00MMMMdddd(0)(0)1M0000(0)(0)(0)(0)MRS<c>_<Rd>,_<banked_reg>A1B9.3.9
    cccc00010R10MMMM(1)(1)(1)(1)(0)(0)1M0000nnnnMSR<c>_<banked_reg>,_<Rn>A1B9.3.10
    cccc00010R10mmmm(1)(1)(1)(1)(0)(0)0(0)0000nnnnMSR<c>_<spec_reg>,_<Rn>A1B9.3.12
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1011mmmmSTRH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_STRH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.218
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1101mmmmLDRD<c>_<Rt>,_<Rt2>,_[<Rn>,+/-<Rm>]{!}_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>],+/-<Rm>A1A8.8.74
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1111mmmmSTRD<c>_<Rt>,_<Rt2>,_[<Rn>,+/-<Rm>]{!}_STRD<c>_<Rt>,_<Rt2>,_[<Rn>],+/-<Rm>A1A8.8.211
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1011mmmmLDRH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.82
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1101mmmmLDRSB<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRSB<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.86
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1111mmmmLDRSH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRSH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.90
    cccc000PU1W0nnnnttttxxxx1011xxxxSTRH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_STRH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_STRH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.217
    cccc000PU1W0nnnnttttxxxx1101xxxxLDRD<c>_<Rt>,_<Rt2>,_[<Rn>{,_#+/-<imm8>}]_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>],_#+/-<imm8>_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>,_#+/-<imm8>]!A1A8.8.72
    cccc000PU1W0nnnnttttxxxx1111xxxxSTRD<c>_<Rt>,_<Rt2>,_[<Rn>{,_#+/-<imm8>}]_STRD<c>_<Rt>,_<Rt2>,_[<Rn>],_#+/-<imm8>_STRD<c>_<Rt>,_<Rt2>,_[<Rn>,_#+/-<imm8>]!A1A8.8.210
    cccc000PU1W11111ttttxxxx1011xxxxLDRH<c>_<Rt>,_<label>_LDRH<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.81
    cccc000PU1W11111ttttxxxx1101xxxxLDRSB<c>_<Rt>,_<label>_LDRSB<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.85
    cccc000PU1W11111ttttxxxx1111xxxxLDRSH<c>_<Rt>,_<label>_LDRSH<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.89
    cccc000PU1W1nnnnttttxxxx1011xxxxLDRH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.80
    cccc000PU1W1nnnnttttxxxx1101xxxxLDRSB<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRSB<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRSB<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.84
    cccc000PU1W1nnnnttttxxxx1111xxxxLDRSH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRSH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRSH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.88

    The only bit that is either '0' or '1' (instruction specific) when the first 3 bits (27, 26, 25) after condition code field is 0 0 0, is bit 4.

    If bit 4 = 0 then you can use the next bits (24, 23) but if bit 4 is '1', you have to check from bit 7 what are the next bits.

    If bit 7 is '0', then next bits are 24 and 23, if bit 7 is '1', the next bits are 6 and 5.

    And so on.

    if the bits 27, 26 and 25 are 0 1 0, then there is only one instruction: single data transfer:

    In the list it's listed like in the manual, but in the reality it's:

    cccc010PUBWLnnnnttttxxxxxxxxxxxxA1A8.8.204

    P = 1, pre-indexing, otherwise post-indexing or offset

    U = 1 offset is added, otherwise offset is subtracted

    B = 1 byte access, else word access

    W = 1 writeback, else no writeback

    L = 1 load, else store

    It's a bit different with the media- or special LD/ST-instructions.

    Sometimes all the above is not used, but are just part of opcode, sometimes B = 1 register, else immediate(?).

    BTW, there will be another update to the ARM instruction list, and to the spreadsheet. I'll look into Thumbs not until I learn to do it better playing with ARM instructions first.

Reply
  • Unfortunately it's not so simple - this kind of things will mess it up causing "false positives":

    cccc00010B00nnnntttt(0)(0)(0)(0)1001TTTTSWP{B}<c>_<Rt>,_<Rt2>,_[<Rn>]A1A8.8.229
    cccc00010R00(1)(1)(1)(1)dddd(0)(0)0(0)0000(0)(0)(0)(0)MRS<c>_<Rd>,_<spec_reg>A1B9.3.8
    cccc00010R00MMMMdddd(0)(0)1M0000(0)(0)(0)(0)MRS<c>_<Rd>,_<banked_reg>A1B9.3.9
    cccc00010R10MMMM(1)(1)(1)(1)(0)(0)1M0000nnnnMSR<c>_<banked_reg>,_<Rn>A1B9.3.10
    cccc00010R10mmmm(1)(1)(1)(1)(0)(0)0(0)0000nnnnMSR<c>_<spec_reg>,_<Rn>A1B9.3.12
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1011mmmmSTRH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_STRH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.218
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1101mmmmLDRD<c>_<Rt>,_<Rt2>,_[<Rn>,+/-<Rm>]{!}_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>],+/-<Rm>A1A8.8.74
    cccc000PU0W0nnnntttt(0)(0)(0)(0)1111mmmmSTRD<c>_<Rt>,_<Rt2>,_[<Rn>,+/-<Rm>]{!}_STRD<c>_<Rt>,_<Rt2>,_[<Rn>],+/-<Rm>A1A8.8.211
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1011mmmmLDRH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.82
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1101mmmmLDRSB<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRSB<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.86
    cccc000PU0W1nnnntttt(0)(0)(0)(0)1111mmmmLDRSH<c>_<Rt>,_[<Rn>,+/-<Rm>]{!}_LDRSH<c>_<Rt>,_[<Rn>],+/-<Rm>A1A8.8.90
    cccc000PU1W0nnnnttttxxxx1011xxxxSTRH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_STRH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_STRH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.217
    cccc000PU1W0nnnnttttxxxx1101xxxxLDRD<c>_<Rt>,_<Rt2>,_[<Rn>{,_#+/-<imm8>}]_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>],_#+/-<imm8>_LDRD<c>_<Rt>,_<Rt2>,_[<Rn>,_#+/-<imm8>]!A1A8.8.72
    cccc000PU1W0nnnnttttxxxx1111xxxxSTRD<c>_<Rt>,_<Rt2>,_[<Rn>{,_#+/-<imm8>}]_STRD<c>_<Rt>,_<Rt2>,_[<Rn>],_#+/-<imm8>_STRD<c>_<Rt>,_<Rt2>,_[<Rn>,_#+/-<imm8>]!A1A8.8.210
    cccc000PU1W11111ttttxxxx1011xxxxLDRH<c>_<Rt>,_<label>_LDRH<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.81
    cccc000PU1W11111ttttxxxx1101xxxxLDRSB<c>_<Rt>,_<label>_LDRSB<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.85
    cccc000PU1W11111ttttxxxx1111xxxxLDRSH<c>_<Rt>,_<label>_LDRSH<c>_<Rt>,_[PC,_#-0]_Special_caseA1A8.8.89
    cccc000PU1W1nnnnttttxxxx1011xxxxLDRH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.80
    cccc000PU1W1nnnnttttxxxx1101xxxxLDRSB<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRSB<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRSB<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.84
    cccc000PU1W1nnnnttttxxxx1111xxxxLDRSH<c>_<Rt>,_[<Rn>{,_#+/-<imm8>}]_LDRSH<c>_<Rt>,_[<Rn>],_#+/-<imm8>_LDRSH<c>_<Rt>,_[<Rn>,_#+/-<imm8>]!A1A8.8.88

    The only bit that is either '0' or '1' (instruction specific) when the first 3 bits (27, 26, 25) after condition code field is 0 0 0, is bit 4.

    If bit 4 = 0 then you can use the next bits (24, 23) but if bit 4 is '1', you have to check from bit 7 what are the next bits.

    If bit 7 is '0', then next bits are 24 and 23, if bit 7 is '1', the next bits are 6 and 5.

    And so on.

    if the bits 27, 26 and 25 are 0 1 0, then there is only one instruction: single data transfer:

    In the list it's listed like in the manual, but in the reality it's:

    cccc010PUBWLnnnnttttxxxxxxxxxxxxA1A8.8.204

    P = 1, pre-indexing, otherwise post-indexing or offset

    U = 1 offset is added, otherwise offset is subtracted

    B = 1 byte access, else word access

    W = 1 writeback, else no writeback

    L = 1 load, else store

    It's a bit different with the media- or special LD/ST-instructions.

    Sometimes all the above is not used, but are just part of opcode, sometimes B = 1 register, else immediate(?).

    BTW, there will be another update to the ARM instruction list, and to the spreadsheet. I'll look into Thumbs not until I learn to do it better playing with ARM instructions first.

Children
No data