ARM/THUMB instructions that change execution path?

Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?

I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find

the next instruction(s) where the next breakpoint instruction needs to be set.

There are three cases:

1) current instruction doesn't change the execution path. Next instruction is the next word.

2) current instruction is a jump. The operand defines the next instruction address

3) current instruction is conditional branch. One possible next instruction is the next word, the other possible

instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).

To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.

I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,

or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.

Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.

Parents
  • A couple of observations...

    Depends on what you mean by "word".  Usually word means 4 bytes for ARM processors.  ARM instructions are word sized, Thumb instructions can be word or halfword sized.

    On (2/3), these category covers a fairly hefty portion of the instruction set.  Most data processing and load instructions allow the PC as the destination.  For example "ADD pc, r0, r1" would cause the processor to branch to the address "r0+r1".

    There are also a number of special cases.  For example SVC, SMC and HVC all cause exceptions, which you could think of as a special kind of branch.  Similarly, "SUBS pc, lr, #4" would perform an exception return - which again you could think of as special type of branch.

    Then there are other types of exception to think of.  Something like "VADD.F32 s0, s1, s2" would either perform a 32-bit floating point addition, or trigger an exception if the FPU wasn't enabled.  Any kind of load/store could trigger an exception due to MMU/MPU checks.

Reply
  • A couple of observations...

    Depends on what you mean by "word".  Usually word means 4 bytes for ARM processors.  ARM instructions are word sized, Thumb instructions can be word or halfword sized.

    On (2/3), these category covers a fairly hefty portion of the instruction set.  Most data processing and load instructions allow the PC as the destination.  For example "ADD pc, r0, r1" would cause the processor to branch to the address "r0+r1".

    There are also a number of special cases.  For example SVC, SMC and HVC all cause exceptions, which you could think of as a special kind of branch.  Similarly, "SUBS pc, lr, #4" would perform an exception return - which again you could think of as special type of branch.

    Then there are other types of exception to think of.  Something like "VADD.F32 s0, s1, s2" would either perform a 32-bit floating point addition, or trigger an exception if the FPU wasn't enabled.  Any kind of load/store could trigger an exception due to MMU/MPU checks.

Children
  • Sorry about my sloppy use of 'words' (pun intended).

    It was just easier to talk about words (as 32-bit entities,like ARM instructions).

    I'm painfully aware of the abundance of instruction flow changing instructions, but the main thing is single stepping

    (implementation of a gdb stub),

    Even if ADD can have PC as target (causing a jump) there are many adds that do not. Only those with PC as the target count. Similarly, interrupts and HW exceptions are not single stepped. SW excptions _might_ be (although maybe later).

    Usually FP errors are considered HW faults in these kinds of situations. Also, conditional execution doesn't change the instruction flow, but just changes the instruction functionality (if condition is false, the instruction becomes NOP).

    There is such a code in the gdb client, but it looks like it uses the disassembler code, so it's quite complicated.

  • On the Atari ST, I wrote a debugger, which could single-step in various ways.

    One of the ways included copying the instruction to a local buffer in RAM (because when the instruction is located in ROM, you can't set a breakpoint, since a breakpoint is an instruction).

    So I copied the instruction to the local buffer and placed a return-from-exception right after it, then executed the instruction by temporarily changing PC to that local buffer.

    Doing such things requires you to know the size and behaviour of the instruction. For instance, a relative jump would not be suitable for copying there.

    When executing conditional ARM Thumb instructions (eg. Cortex-M), you may have to take the IT instruction into account; I have not experimented with this myself, so I do not have any experience - but I can imagine that you might want to set breakpoints on all of the 4 instructions. There will probably not be any problems with the IT cache, though, because it's possible to have IT instructions inside an interrupt, so I believe the state is saved on the stack.