Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?
I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find
the next instruction(s) where the next breakpoint instruction needs to be set.
There are three cases:
1) current instruction doesn't change the execution path. Next instruction is the next word.
2) current instruction is a jump. The operand defines the next instruction address
3) current instruction is conditional branch. One possible next instruction is the next word, the other possible
instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).
To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.
I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,
or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.
Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.
This is getting "interesting".
I generated the masks and data from the excel, then I exported the data (only mask data, no instruction names) as .csv.
Then I run sort and then uniq.
The result was 475 lines. The data before uniq was 498 lines.
=> there are 23 lines (instructions) dropped.
Run uniq -cd and searched the non-unique instruction data (added the instructions at the ends of lines):
I hope it's not very complicated to tell apart the instructions with the same data.
There sure will be the same number of masks or more data than masks. If you had more masks than data, you would have a problem.
It's expected that you have shared masks as well. For instance, if you take the 'singles' instruction group, you'd have something like WFI and WFE sharing the same mask, but they have different data. These instructions do not take any parameters.
Those are the data bits. And they are shared.
I managed to find the instructions and the way to tell them apart though, so all's well now.
I just noticed that I've been struggling with the ARM instruction set for a month!
And now I think I've finally tamed the ARM instruction set. (Although surprises have appeared before...)
Back to C-coding.
I think I'll return to the Thumb-instruction set later, and see if I get at least something working with the ARM instruction set first.
Also updates to github can wait a while.
Uhm, I was about to object.
But I believe that you split WFE and WFI for instance and handle them as one "kind" of instruction, then test their bit in the handler (if necessary).
-Actually WFE and WFI should of course be treated the same way (they're almost identical in behaviour from a debugger's point of view), so no need to do special checking there.
Writing a debugger gives you a good solid foundation in an instruction set / assembly language. It forces you through each instruction and you'll know much better what each instruction is capable of and not capable of.
I think that you may quickly become more experienced with some of the instructions, than programmers that have worked with the instructions for a while.
Yes, my lazy nature...
If I do a full decoding and then still handle instructions in groups, I'll have a decoder "sceleton" that I can use for something else if I happen to need to.
case arm_xtra_hint:
// WFE,WFI
// neither changes the program flow
retval = set_addr_lin();
break;
Then again:
case arm_xtra_cmode:
// Check cmode to see if it's VBIC (imm) or VMVN (imm)
if (bitrng(instr, 11,9) != 7)
{
// either VBIC or VMVN
if (bitrng(instr, 11, 10) == 3)
// VMVN
}
else if (bit(instr, 8))
// VBIC
else
// else UNDEFINED
Well, haven't got too far in this yet.
(And I need to change the enums used for the switch-cases here.
I sure hope so. Otherwise I only get lots of head aches, short nights and medium level mental health problems.
Plus badly managed household.
A-ha, the replys didn't go after the postings replied to...
It appears there's a maximum indent level. Perhaps threads weren't meant to contain long discussions.
The LD/ST encodings are "interesting":
There are basically 4 main groups of them
when the bits 27 -25:
0 0 0:
if bit 6 = 0
bit 20 = LD/ST: 0=ST, 1=LD
bit 5 = EX/H: 0=EX, 1=H (STREX/STRH)
EX: bits 22 - 21: 00=REX, 01=REXD, 10=REXB, 11=REXH
H: bit22: 1=imm, 0=reg
bit21=writeback
if bit 6 = 1
bit 20 = 0: LDRD/STRD
bit 5: 0=LD, 1=ST
bit 21=writeback
bit 22: 0=reg, 1=imm
bit 20 = 1:
if bit 5 = 1 LDRSB
if bit 5 = 0 LDRSH
(there is no STRSB or STRSH)
0 1 0: LD/ST imm
bits 24 - 20: PUBWL
P= post-indexing
U=immediate sign (1=added, 0=subtracted)
B=access (0=word, 1=byte)
W=writeback (1=writeback, 0=n0 writeback)
L: 1=load, 0=store
(special case: P=0, W=1 => unprivileged)
0 1 1: LD/ST reg
same as LD/ST imm
1 0 0: LDM/STM
bits 24 - 20: BIMWL
B= before (0=after,1=before)
I=increment (0=decrement, i=increment)
M=mode (0=current mode, 1=user mode)