ARM/THUMB instructions that change execution path?

Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?

I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find

the next instruction(s) where the next breakpoint instruction needs to be set.

There are three cases:

1) current instruction doesn't change the execution path. Next instruction is the next word.

2) current instruction is a jump. The operand defines the next instruction address

3) current instruction is conditional branch. One possible next instruction is the next word, the other possible

instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).

To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.

I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,

or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.

Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.

  • Meanwhile forging ahead forging a head.

    What the heck is "unsigned saturation of a signed value" (USAT)?

    Where does signed manipulation change to unsigned manipulation?

    operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored

    (result, sat) = UnsignedSatQ(SInt(operand), saturate_to);

    R[d] = ZeroExtend(result, 32);

    (bits(N), boolean) UnsignedSatQ(integer i, integer N)

    if i > 2^N - 1 then

    result = 2^N - 1; saturated = TRUE;

    elsif i < 0 then

    result = 0; saturated = TRUE;

    else

    result = i; saturated = FALSE;

    return (result<N-1:0>, saturated);

    Is it just one-sided saturation / "rectification"?

  • This is truely a laborous project.

    I calculated that there are 264 ARM instructions to go through, and the encoding is not very 'canonical'.

    Then there seems to be 329 thumb instructions.

    Also the "holes" in the encoding needs to be checked, because there are lots of "UNDEFINED" in them,

    meaning that executing such instruction causes UND-exception.

    Then the instruction encoding is often described like:

    31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

        cond      0  0 op     op1         op2

    [EDIT]

    The page seems to have been playing tricks...

    'op' =bit 25, 'op1'=bits 24 - 20 and 'op2'=bits 7 - 4.

    [/EDIT]

    Table A5-2 shows the allocation of encodings in this space.

            Table A5-2 Data-processing and miscellaneous instructions

        op       op1          op2     Instruction or instruction class               Variant

        0        not 10xx0    xxx0    Data-processing (register) on page A5-197      -

                              0xx1    Data-processing (register-shifted register) on page A5-198     -

                 10xx0        0xxx    Miscellaneous instructions on page A5-207      -

                              1xx0    Halfword multiply and multiply accumulate on page A5-203       -

                 0xxxx        1001    Multiply and multiply accumulate on page A5-202                -

                 1xxxx        1001    Synchronization primitives on page A5-205      -

                 not 0xx1x    1011    Extra load/store instructions on page A5-203                   -

                              11x1    Extra load/store instructions on page A5-203                   -

                 0xx10        11x1    Extra load/store instructions on page A5-203                   -

                 0xx1x        1011    Extra load/store instructions, unprivileged on page A5-204     -

                 0xx11        11x1    Extra load/store instructions, unprivileged on page A5-204     -

        1        not 10xx0    -       Data-processing (immediate) on page A5-199     -

                 10000        -       16-bit immediate load, MOV (immediate) on page A8-484          v6T2

                 10100        -       High halfword 16-bit immediate load, MOVT on page A8-491       v6T2

                 10x10        -       MSR (immediate), and hints on page A5-206      -

    Really fun!

  • I once made some code, which generated cycle-accurate code at run-time for execution.

    Doing that on Cortex-M is not easy, because constants are chopped into small bits and pieces, plus they're moved around.

    Generating code on-the-fly is a challenge, because creating those constants may take too long, compared to how long it takes to execute the code.

    I would have expected that the MOVW instruction had a 16-bit immediate value that would go unmodified into the instruction, but it was not like that, thus I ended up with a different solution.

    BTW: if you want to format your edits, you can click the "Use advanced editor" in the top right-hand corner. Then switch to the HTML editor and change the font to ...

    Monaco, Courier, mono-space; after that, click "Use advanced editor" again, now the editor should allow you to make some fixed-font formatting.

    It takes some "getting used to", but it's possible to do it; I've done it on each of my documents, Writing your own startup code for Cortex-M was the first.

  • Doing that on Cortex-M is not easy, because constants are chopped into small bits and pieces, plus they're moved around.

    I can understand. I couldn't avoid seeing the Thumb encodings while going through ARM encodings in the ARMv7-A/R ARM.

    I'm sweating blood for just thinking about going through the Thumb instructions.

    (There are still a lot of ARM instructions to do.)

    BTW: if you want to format your edits, you can click the "Use advanced editor" in the top right-hand corner. Then switch to the HTML editor and change the font to ...

    Monaco, Courier, mono-space; after that, click "Use advanced editor" again, now the editor should allow you to make some fixed-font formatting.

    Thanks for the hint.

    (BTW, nice tutorial)

  • I wish you good wind on getting through the remaining instructions!

    -Great that you got the fixed font working (you can also use 'view source' on one of my tutorials, to see what HTML code I've used, because in some cases it was necessary to do special edits; like the Monaco font, because it's not available from the menu).

    There are a few other tutorials. I promised a series of tutorials on basic math on the Cortex-M: http://community.arm.com/docs/DOC-9653 .

    Hopefully I can find some time to release the next part soon.

    The idea is to provide something for both beginners and experienced programmers.

    Beginners will get the basics, experienced programmers might be able to find some useful tips and tricks now and then.

    You probably know the GNU Assembler well by now, but perhaps you can find something interesting in the article about the macros.

  • I wish you good wind on getting through the remaining instructions!

    There seems to be a great deal of wind, but unfortunately foul wind.

    (I just found another table according to which, there are groups of instructions in an area I thought I already finished.)

    Your tutorials seem interesting. I have used GAS yes, but my very first ARM assembly code is in the github (this project) for everyone to see and have some laughs.

  • I just had a look, and I don't think it's bad.

    You may want to add #4 to r4 and r6 when copying, because they're 32-bit pointers.

    -This can be done automatically by postfixing the ldr/str instructions by ,#4 ...

        ldr    r7,[r4],#4

        str    r7,[r6],#4

        cmp    r4, r5

        bmi    loop$

    -It should be slightly faster.

  • Yes, of course. Thanks. At the time I wrote it, I was probably so anxious I messed up bytes and words in my mind.

    I think ARM could have done the loop still, but slower due to "misalignment" and the bytes (most of them) would have been copied 4 times?

    In that (You checked the start.S?) code, it copies the stub at a higher memory leaving the default addresses for debuggee.

    The start1.c is not really needed, but I left it there as an example how to get symbol values from linker script and maybe (later) preparing the boot arguments. Maybe I'll add serial baud setting that way.

  • Correct. The data would have been copied and the misalignment would cause a slowdown.

    Unfortunately, the misalignment slowdown is not just 2 reads/writes, it's 3: b+h+b.

    Yes, it was start.S I had a look at, sorry, I forgot to mention that - but it was the only assembly language file I found.

  • Got a very clear answer here: Re: UPREDICTABLE instructions .

    UNPREDICTABLE may be implemented as UNDEFINED.

  • Now that I have the instruction tables...

    I don't think the table-method works very well. The "significant bits" seem to move around all the time:

    (With "significant bits" I mean bit position that have instruction specific '1' or '0' through the whole group of instuction encodings the instruction is checked against. In this "group" the "significant bits are 27, 26, 25 and 22. In reality, bit 22 is really not "significant, but B-bit indicating byte vs. word access. STR, LDR, STRB and LDRB are really aliases of one and the same instruction)

    cccc0111101wwwwwddddbbbbb101nnnn  SBFX<c> <Rd>,<Rn>,#<lsb>,#<width> A1A8.8.164
    cccc0111110bbbbbddddbbbbb0011111  BFC<c> <Rd>,#<lsb>,#<width> A1A8.8.19
    cccc0111110bbbbbddddbbbbb001nnnn  BFI<c> <Rd>,<Rn>,#<lsb>,#<width> A1A8.8.20
    cccc0111111wwwwwddddbbbbb101nnnn  UBFX<c> <Rd>,<Rn>,#<lsb>,#<width> A1A8.8.246
    cccc011PU0W0nnnnttttxxxxxTT0mmmm  STR<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} STR<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>} A1A8.8.205
    cccc011PU0W1nnnnttttxxxxxTT0mmmm  LDR<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} LDR<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>} A1A8.8.66
    cccc011PU1W0nnnnttttxxxxxTT0mmmm  STRB<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} STRB<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>} A1A8.8.208
    cccc011PU1W1nnnnttttxxxxxTT0mmmm  LDRB<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} LDRB<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>} A1A8.8.70
    cccc100000W0nnnnLLLLLLLLLLLLLLLL  STMDA<c> <Rn>{!},<registers> A1A8.8.200
    cccc100000W1nnnnLLLLLLLLLLLLLLLL  LDMDA<c> <Rn>{!},<registers> A1A8.8.59
    cccc100010111101LLLLLLLLLLLLLLLL  POP<c> <registers> A1A8.8.132
  • I'm happy to hear that the table method will work.

    Normally, instruction groups are very logical.

    Yes, you could say that the LDR/STR instruction is just a 'transfer instruction between memory and registers', where there's a flag indicating the direction.

  • Well, the table method doesn't work, I saw that when I got the spreadsheet about the instructions done.

    When you look at the sheet, you can see that with all the instructions only bits 27, 26 and 25 are significant (= not belonging to an operand or modifier of some instruction). When you take a group of instructions that have bits 27, 26 and 25 all zeros, the only significant bit is bit 4. With instruction groups that have bits 27 - 25 and 4 zero, the new significant bits are 24, 23 and 21. If bit 4 is one, the only new significant bit is bit 7 and so on.

    Makes one h*** of a decoding logic.

    The spreadsheet is pretty nice tool for something like this: you can sort the lines (instructions) by different combinations of columns to find out which decoding steps work.

    I got the arithmetic & logic instructions nicely organized by selecting the lines and organizing them as bit 25 as primary, bit 4 as secondary and bit 21 as tertiary sorting "key". The instructions become sorted by addressing modes.

    First all register operation, then register shifted register and last the immediates. You can't apply all the keys at the same time, unfortunately, but one by one.

    This is what it looks like:

    cccc0000000SnnnnddddxxxxxTT0mmmm  AND{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.14
    cccc0000010SnnnnddddxxxxxTT0mmmm  SUB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.223
    cccc0000100SnnnnddddxxxxxTT0mmmm  ADD{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.7
    cccc0000110SnnnnddddxxxxxTT0mmmm  SBC{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.162
    cccc0001100SnnnnddddxxxxxTT0mmmm  ORR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.123

    Sorting with bits 24 - 21 in that order gives:

    cccc0000000SnnnnddddxxxxxTT0mmmm  AND{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.14
    cccc0000000Snnnnddddssss0TT1mmmm  AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15
    cccc0010000Snnnnddddxxxxxxxxxxxx  AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13
    cccc0000001SnnnnddddxxxxxTT0mmmm  EOR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.47
    cccc0000001Snnnnddddssss0TT1mmmm  EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48
    cccc0010001Snnnnddddxxxxxxxxxxxx  EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46
    cccc0000010SnnnnddddxxxxxTT0mmmm  SUB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.223
    cccc0000010Snnnnddddssss0TT1mmmm  SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224
    cccc0010010Snnnnddddxxxxxxxxxxxx  SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222
    cccc0000011SnnnnddddxxxxxTT0mmmm  RSB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>} A1 A8.8.153
  • I still think it's possible to use the table, especially, if you're only decoding 32-bit instructions.

    Let's make a few rules:

    1: If a bit is known to be either 1 or 0, set the corresponding bit in mask to 1, otherwise set it to 0.

    2: If a bit is known, set keep the bit in data, otherwise set the corresponding bit in data to 0.

    That is ...

    Instr: cccc0000010Snnnnddddssss0TT1mmmm

    mask : 00001111111000000000000010010000

    data : 00000000010000000000000000010000

    Using this rule, we can convert the above table using the attached perl-script.

    We now get the following (slightly decorated):

    static const InstrTab sInstructionTable[] = {

        0x0fe00010, 0x00000000, "and", &and_shift,  /* AND{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.14 */

        0x0fe00090, 0x00000010, "and", &and_reg, /* AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15 */

        0x0fe00000, 0x02000000, "and", &and_imm, /* AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13 */

        0x0fe00010, 0x00200000, "eor", &eor_shift, /* EOR{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.47 */

        0x0fe00090, 0x00200010, "eor", &eor_reg, /* EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48 */

        0x0fe00000, 0x02200000, "eor", &eor_imm, /* EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46 */

        0x0fe00010, 0x00400000, "sub", &sub_shift, /* SUB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.223 */

        0x0fe00090, 0x00400010, "sub", &sub_reg, /* SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224 */

        0x0fe00000, 0x02400000, "sub", &sub_imm, /* SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222 */

        0x0fe00010, 0x00600000, "rsb", &rsb_shift, /* RSB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.153 */

        0x00000000, 0x00000000, "", &done, /* end of list */

    };

    Now, the problem with the table above, is that the order is incorrect. This can be fixed easily:

    static const InstrTab sInstructionTable[] = {

        0x0fe00090, 0x00000010, "and", &and_reg, /* AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15 */

        0x0fe00010, 0x00000000, "and", &and_shift,  /* AND{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.14 */

        0x0fe00000, 0x02000000, "and", &and_imm, /* AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13 */

        0x0fe00090, 0x00200010, "eor", &eor_reg, /* EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48 */

        0x0fe00010, 0x00200000, "eor", &eor_shift, /* EOR{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.47 */

        0x0fe00000, 0x02200000, "eor", &eor_imm, /* EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46 */

        0x0fe00090, 0x00400010, "sub", &sub_reg, /* SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224 */

        0x0fe00010, 0x00400000, "sub", &sub_shift, /* SUB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.223 */

        0x0fe00000, 0x02400000, "sub", &sub_imm, /* SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222 */

        0x0fe00010, 0x00600000, "rsb", &rsb_shift, /* RSB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.153 */

        0x00000000, 0x00000000, "", &done, /* end of list */

    };

    (First two AND were swapped, first two EOR were swapped and first two SUB were swapped).

    -Of course, the Perl-script could be enhanced to do this automatically, generating extra masks and data in those cases where that is needed.

    So the following should work:

        InstrTab *tab;
        uint32_t instr;
    
        instr = *pc;
        tab = &sInstructionTable;
        while((instr & tab->mask) != tab->data)
        {
            tab++;
        }
        tab->handle(instr,tab);
    
    
    
    

    If writing the code in assembly language, I believe it would be a good idea to re-arrange the contents of the table, so that the instruction name comes first, then the mask, the data and finally the handler.

    find_instr:

        ldmia   r4!,{r1-r3,r12}

        and     r2,r2,r0

        cmp     r3,r2

        bne     find_instr

        /* here r0 contains the instruction opcode, r1 contains the name and r12 contains the address of the handler */

        pop     { ... }          /* restore any saved registers */

        bx      r12              /* jump directly to the handler */


  • Hmm, I guess I've been working too hard with the project - I didn't come to think adding the handler in the table.

    (I don't like to use function pointers if not necessary - the execution path gets fuzzy and some debuggers can't handle them, but using an enum...) I had a different idea about what the table method meas, but I don't remember what I thought any more. What I remember is that I didn't think of comparing an instruction to all instructions of the instruction set.

    If applied to the whole instruction set, it may be a bit heavy for single stepping - on the average it means going through half of the table for each instruction each time.

    Then again writing the logic in C code takes a lot of time, and the code in not something you'd show to small children...

    (I made the pseudocode whith which I'm still not completely happy, but the C-coding of it is not started yet.)

    Maybe I should use the table approach now after all, and go for the C code logic later (if I still feel like it).

    I really have to rethink this.

    Especially now that I have the instruction set in a spreadsheet from which the instruction bit patterns are easily edited (bits into bytes, byte order, ...) and they can be printed out as a text file for simple editing (if any needed) into a table initializer data form.  Also the mask and data can be easily generated with spreadsheet formuli.

    The table is also quite compact form - the instruction set itself probably fits in 1 kB, and the actual handlers need to be written anyway.

    Too bad there is only 'like'-button. There should also be 'Halleluyah'-button.