ARM/THUMB instructions that change execution path?

Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?

I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find

the next instruction(s) where the next breakpoint instruction needs to be set.

There are three cases:

1) current instruction doesn't change the execution path. Next instruction is the next word.

2) current instruction is a jump. The operand defines the next instruction address

3) current instruction is conditional branch. One possible next instruction is the next word, the other possible

instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).

To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.

I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,

or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.

Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.

  • Good work, sir!

    As far as I know, you're a pioneer in this area (eg. writing a bootloader debugger).

  • I was doing nicely until I had to start decoding that huge set of instructions.

    Pretty laborous and slow.

    Like the media instructions: I found that unsigned and signed parallel addition and subtractions instructions can be

    handled similarly, like:

    A8.8.158cond 0 1 1 0 0 0 0 1       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmSADD16<c> <Rd>, <Rn>, <Rm>A1
    A8.8.243cond 0 1 1 0 0 1 0 1       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmUADD16<c> <Rd>, <Rn>, <Rm>A1
    A8.8.135cond 0 1 1 0 0 0 1 0       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmQADD16<c> <Rd>, <Rn>, <Rm>A1
    A8.8.258cond 0 1 1 0 0 1 1 0       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmUQADD16<c> <Rd>, <Rn>, <Rm>A1
    A8.8.169cond 0 1 1 0 0 0 1 1       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmSHADD16<c> <Rd>, <Rn>, <Rm>A1
    A8.8.249cond 0 1 1 0 0 1 1 1       Rn     Rd (1) (1) (1) (1) 0 0 0 1 RmUHADD16<c> <Rd>, <Rn>, <Rm>A1

    All patterns are similar except bit 22=1: unsigned, bits 21,20: 00=undef, 01=basic, 10=Q and 11=H.

    That seems to be the same with all unsigned and signed parallel addition and subtractions instructions.

    It's easier to see the patterns when you can just search and copy the instructions to another file (like the above)

    and compare the bits of all the interesting instructions at once, and with patterns you can often unify the manipulation.

    It would be nice to get the tables into excel or ods for even more flexible manipulation (moving columns around and sorting lines by some columns). Maybe one day.

    [EDIT]

    The web page somehow makes the lines into a table and wraps some cells.

    It looks OK when writing/editing, but after posting it looks a bit weird.

    [/EDIT]

  • I once wrote a debugger for 68xxx (it was back in 1988 I think).

    There I had a mask and data for each kind of instruction.

    Eg.

        if((instr & mask) == data){ found; }

    Here you will need to sort the table, so that masking will work correctly; otherwise, if your first entry's mask is 0x0000, then you'll always get the same result.

    Of course it is a good idea ending the table with mask 0x0000 and data 0x0000; this way you'll find the instruction fairly quickly.

    You could make a similar table and add a 'type' entry. For instance type could be 'add' and then a positive/negative, in order to merge add and subtract into one handler (that might simplify and shorten the code).

    Having handler types and a mode field would perhaps also make it possible to simplify the code.

    Opcodes usually come in 'families'. It may be a good idea to find out which instruction family the opcode belongs to first, then handle the remaining part of the instruction decoding after that, however, it may be quicker to just use the mask and data system, depending on the number of instructions.

  • Table approach may not be very good with ARM, because the instruction defining bits are not in constant places. Not even mostly, except the 3 bits after condition code, and sometimes a special register value makes another instruction.

    Also, in this program, I don't care about the instruction as such, but just the 'next address' after the instruction.

    Sometimes it's easier to execute than to 'simulate' the code:

    unsigned int check_msr_reg(uint32_t instr)

    {

        unsigned int new_pc = rpi2_reg_context.reg.r15;

        // if user mode, then can't even guess

        tmp1 = (uint32_t) rpi2_reg_context.reg.cpsr;

        if ((tmp1 & 0x1f) == 0) // user mode

        {

            // UNPREDICTABLE - whether banked or not

            // the bits 15 - 0 are UNKNOWN

            new_pc = INSTR_ADDR_UNDEF;

        }

        else

        {

            // privileged mode - both reg and banked reg

            tmp2 = (instr & 0xffff0fff) | (1 << 12); // edit Rd = r1

            iptr = (uint32_t *) mrs_regb;

            *iptr = tmp2;

            asm(

                "push {r0, r1}\n\t"

                "mrs r1, cpsr @ save cpsr\n\t"

                "push {r1}\n\t"

                "ldr r0, =tmp1 @ set cpsr\n\t"

                "msr cpsr, r0 @ note: user mode is already excluded\n\t"

                "mrs_regb: .word 0 @ execute instr with our registers\n\t"

                "ldr r0, =tmp2\n\t"

                "str r1, [r0] @ store result to tmp2\n\t"

                "pop {r1} @ restore cpsr\n\t"

                "msr cpsr, r1\n\t"

                "pop {r0, r1}\n\t"

                );

            new_pc = (unsigned int) tmp2;

        }

        return new_pc;

    }

    In this project this far I've learned about ARM (never really used before), awk (never used it before) and inline assembly (accessing C variables). For some unknown reason, I haven't been keen to use the inline assembly extensions though.

  • BTW, is your 68k debugger anywhere to be seen? I might like to take a look some time later when I have more time.

    I like Motorola/Freescale 6.8k/68k assembly languages. So wonderfully symmetrical.

  • Sadly, it was never released (and I don't have the sources here). I wrote it because MonST/MonTT could not debug a game I was participating in writing.

    Yes, 68xxx was very convenient and easy in many ways; unfortunately, the performance wasn't so impresive. Reading/writing 400KByte/sec.

  • Ah, yes. things are coming back to me.

    My debugger usually ran on a 68000, but my MegaSTE had a 68010, so I wrote an instruction emulator. It could emulate 68010, 20, 30, 40 and CPU32 instructions (the latter was never tested, though).

    -Sometimes it's also a lot faster to simulate instruction execution.

  • Reading/writing 400KByte/sec.

    What kind of reading/writing are you talking about?

    I remember when I had plans to make a computer based on 68030.

    The plan was trashed by the fact that those days home made address decoding would have been so slow that it wasn't worth the while. I considered v22 PLAs and some FPGAs, but the delays were far too big. With mask-programmed gate array it would have been a beast, but VERY expensive. I guess a mask cost like $1 000 000 back then.

    68030 could do memory accesses in synchronous nibble mode in 55 ns, I recall.

    (The dynamical bus sizing was, as such, really exciting idea.)

  • turboscrew wrote:

    Reading/writing 400KByte/sec.

    What kind of reading/writing are you talking about?

    On my Atari ST, I could reach 400KByte reading or writing per second as maximum (by using the movem.l instruction).

    The main reason for this was most likely Atari's bus architecture.

    Lucky me, any Cortex-M0, even if running only at 1 MHz, is faster.

  • I recall Atari ST was quite nice machine of it's time. 8088 wasn't that impressive either compared to any ARM.

    I had to settle with Commodore 64 with the famous "washing machine processor" (6510).

  • Thunder and blistering!

    I may have to rewrite the media instruction decoding.

    I just realized that my decoding wasn't complete and the non-existent instructions are UNDEFINED.

    It's one thing if the target makes an UND-exception due to bad instruction but quite different thing

    if the debugger single stepping makes an UND-exception trying to find out the next address.

  • The table way might be a good idea after all. At least in some instruction groups, like media instructions.

    The encoding is that sparse and the holes cause UND-exception. They need to be decoded completely.

  • Any idea about instructions marked as UNPREDICTABLE: can it then be UNDEFINED?

    In other words: UNDEFINED REQUIRES the instruction to cause UND-exception, but

    MAY UNPREDICTABLE do that, or does it have to execute normally except that the result may be whatever?

    (I'm not yelling even if I used caps.)

  • I think I need to let the Cortex-A experts answer this question.

    Maybe Alban knows where to find information on this ?