Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?
I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find
the next instruction(s) where the next breakpoint instruction needs to be set.
There are three cases:
1) current instruction doesn't change the execution path. Next instruction is the next word.
2) current instruction is a jump. The operand defines the next instruction address
3) current instruction is conditional branch. One possible next instruction is the next word, the other possible
instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).
To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.
I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,
or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.
Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.
Just to let possibly interested people to know:
I generated a list of encodings (both ARM and Thumb) for Cortex-A7.
The lists may still contain errors, like missing decodings or the like - they were awk-script generated from ARMv7-A/R ARM.
Since I didn't find that kind of lists in the net, I decided to make them.
The lists can be downloaded from https://github.com/turboscrew/rpi_stub (the files ARM_instructions.txt and Thumb_instructions.txt ).
They are text files to allow easier manipulation.
Good work, sir!
As far as I know, you're a pioneer in this area (eg. writing a bootloader debugger).
Outch. It's good that you found out, though!
Any idea about instructions marked as UNPREDICTABLE: can it then be UNDEFINED?
In other words: UNDEFINED REQUIRES the instruction to cause UND-exception, but
MAY UNPREDICTABLE do that, or does it have to execute normally except that the result may be whatever?
(I'm not yelling even if I used caps.)
I think I need to let the Cortex-A experts answer this question.
Maybe Alban knows where to find information on this ?
Meanwhile forging ahead forging a head.
What the heck is "unsigned saturation of a signed value" (USAT)?
Where does signed manipulation change to unsigned manipulation?
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored (result, sat) = UnsignedSatQ(SInt(operand), saturate_to); R[d] = ZeroExtend(result, 32); (bits(N), boolean) UnsignedSatQ(integer i, integer N) if i > 2^N - 1 then result = 2^N - 1; saturated = TRUE; elsif i < 0 then result = 0; saturated = TRUE; else result = i; saturated = FALSE; return (result<N-1:0>, saturated);
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored
(result, sat) = UnsignedSatQ(SInt(operand), saturate_to);
R[d] = ZeroExtend(result, 32);
(bits(N), boolean) UnsignedSatQ(integer i, integer N)
if i > 2^N - 1 then
result = 2^N - 1; saturated = TRUE;
elsif i < 0 then
result = 0; saturated = TRUE;
else
result = i; saturated = FALSE;
return (result<N-1:0>, saturated);
Is it just one-sided saturation / "rectification"?
Got a very clear answer here: Re: UPREDICTABLE instructions .
UNPREDICTABLE may be implemented as UNDEFINED.
Now that I have the instruction tables...
I don't think the table-method works very well. The "significant bits" seem to move around all the time:
(With "significant bits" I mean bit position that have instruction specific '1' or '0' through the whole group of instuction encodings the instruction is checked against. In this "group" the "significant bits are 27, 26, 25 and 22. In reality, bit 22 is really not "significant, but B-bit indicating byte vs. word access. STR, LDR, STRB and LDRB are really aliases of one and the same instruction)
I'm happy to hear that the table method will work.
Normally, instruction groups are very logical.
Yes, you could say that the LDR/STR instruction is just a 'transfer instruction between memory and registers', where there's a flag indicating the direction.
Well, the table method doesn't work, I saw that when I got the spreadsheet about the instructions done.
When you look at the sheet, you can see that with all the instructions only bits 27, 26 and 25 are significant (= not belonging to an operand or modifier of some instruction). When you take a group of instructions that have bits 27, 26 and 25 all zeros, the only significant bit is bit 4. With instruction groups that have bits 27 - 25 and 4 zero, the new significant bits are 24, 23 and 21. If bit 4 is one, the only new significant bit is bit 7 and so on.
Makes one h*** of a decoding logic.
The spreadsheet is pretty nice tool for something like this: you can sort the lines (instructions) by different combinations of columns to find out which decoding steps work.
I got the arithmetic & logic instructions nicely organized by selecting the lines and organizing them as bit 25 as primary, bit 4 as secondary and bit 21 as tertiary sorting "key". The instructions become sorted by addressing modes.
First all register operation, then register shifted register and last the immediates. You can't apply all the keys at the same time, unfortunately, but one by one.
This is what it looks like:
Sorting with bits 24 - 21 in that order gives:
I still think it's possible to use the table, especially, if you're only decoding 32-bit instructions.
Let's make a few rules:
1: If a bit is known to be either 1 or 0, set the corresponding bit in mask to 1, otherwise set it to 0.
2: If a bit is known, set keep the bit in data, otherwise set the corresponding bit in data to 0.
That is ...
Instr: cccc0000010Snnnnddddssss0TT1mmmm
mask : 00001111111000000000000010010000
data : 00000000010000000000000000010000
Using this rule, we can convert the above table using the attached perl-script.
We now get the following (slightly decorated):
static const InstrTab sInstructionTable[] = {
0x0fe00010, 0x00000000, "and", &and_shift, /* AND{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.14 */
0x0fe00090, 0x00000010, "and", &and_reg, /* AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15 */
0x0fe00000, 0x02000000, "and", &and_imm, /* AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13 */
0x0fe00010, 0x00200000, "eor", &eor_shift, /* EOR{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.47 */
0x0fe00090, 0x00200010, "eor", &eor_reg, /* EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48 */
0x0fe00000, 0x02200000, "eor", &eor_imm, /* EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46 */
0x0fe00010, 0x00400000, "sub", &sub_shift, /* SUB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.223 */
0x0fe00090, 0x00400010, "sub", &sub_reg, /* SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224 */
0x0fe00000, 0x02400000, "sub", &sub_imm, /* SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222 */
0x0fe00010, 0x00600000, "rsb", &rsb_shift, /* RSB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.153 */
0x00000000, 0x00000000, "", &done, /* end of list */
};
Now, the problem with the table above, is that the order is incorrect. This can be fixed easily:
(First two AND were swapped, first two EOR were swapped and first two SUB were swapped).
-Of course, the Perl-script could be enhanced to do this automatically, generating extra masks and data in those cases where that is needed.
So the following should work:
InstrTab *tab; uint32_t instr; instr = *pc; tab = &sInstructionTable; while((instr & tab->mask) != tab->data) { tab++; } tab->handle(instr,tab);
If writing the code in assembly language, I believe it would be a good idea to re-arrange the contents of the table, so that the instruction name comes first, then the mask, the data and finally the handler.
find_instr:
ldmia r4!,{r1-r3,r12}
and r2,r2,r0
cmp r3,r2
bne find_instr
/* here r0 contains the instruction opcode, r1 contains the name and r12 contains the address of the handler */
pop { ... } /* restore any saved registers */
bx r12 /* jump directly to the handler */
Hmm, I guess I've been working too hard with the project - I didn't come to think adding the handler in the table.
(I don't like to use function pointers if not necessary - the execution path gets fuzzy and some debuggers can't handle them, but using an enum...) I had a different idea about what the table method meas, but I don't remember what I thought any more. What I remember is that I didn't think of comparing an instruction to all instructions of the instruction set.
If applied to the whole instruction set, it may be a bit heavy for single stepping - on the average it means going through half of the table for each instruction each time.
Then again writing the logic in C code takes a lot of time, and the code in not something you'd show to small children...
(I made the pseudocode whith which I'm still not completely happy, but the C-coding of it is not started yet.)
Maybe I should use the table approach now after all, and go for the C code logic later (if I still feel like it).
I really have to rethink this.
Especially now that I have the instruction set in a spreadsheet from which the instruction bit patterns are easily edited (bits into bytes, byte order, ...) and they can be printed out as a text file for simple editing (if any needed) into a table initializer data form. Also the mask and data can be easily generated with spreadsheet formuli.
The table is also quite compact form - the instruction set itself probably fits in 1 kB, and the actual handlers need to be written anyway.
Too bad there is only 'like'-button. There should also be 'Halleluyah'-button.
When I wrote my 68xxx debugger, the table was fine (though these were only 16-bit words).
-But ARM's instruction set is not too complicated either. I have not had a look at Cortex-A yet, but the time will come.
In case the table gets too large, you have an extra approach: To split the words into two 16-bit words, so you find the "main part", which leads to a "sub-tree".
Remember: The execution unit in the processor does the job very, very quickly, so I am convinced that ARM designed the instruction set, so it should be easy to dispatch (even by using code).
Yes, the hard part is to find out how.
The good thing about using pointers, is that you can make your lookup-routine in assembly language, and it can jump directly to your C routine.
You can then call it as a C-function, because it uses "goto-style", thus it'll be completely transparent and your C-code will behave like a normal subroutine-call; just very quickly.
The above look-up example can be unrolled easily; this will save a few clock cycles on each iteration:
.rept 7
ldmiane r4!,{r1-r3,r12}
andne r2,r2,r0
cmpne r3,r2
.endr
-Change the '.rept' count as you like... make it 15 or 31, adjust it to suit your needs. Perhaps a large number may start to cause longer execution time, but it's a question of balance.
If you're lucky, you can place instructions that are used often in the beginning of the table (I did that with my debugger, and it started to become quite quick at disassembling).
This kind of code is something I really like. The table-lookup, masks and AND stuff - it brings out good memories too.
-But of course, sometimes it might be easier or shorter or quicker to write a switch-statement and use enumerations for handling each instruction type.
Some instruction types could be handled by the same handler; eg. AND/ORR/EOR and ADD/SUB.
In many cases, it's useful to think of instructions as being in "instruction groups". Eg. LDR/STR is a good example, AND/ORR/EOR, ASL/ASR/LSL/LSR/ROR, etc.
If you're lucky, you can place instructions that are used often in the beginning of the table
You read my mind.
But ARM's instruction set is not too complicated either.
Assembly is not, but the encoding is.
and:
Oh, and for small assembly routines I've been using inline asm, like
void rpi2_trap_handler()
{
// IRQs need to be enabled for serial I/O
asm volatile (
"push {r0}\n\t"
"mrs r0, cpsr\n\t"
"bic r0, #128 @ enable irqs\n\t"
"msr cpsr, r0\n\t"
"pop {r0}\n\t"
);
gdb_trap_handler();
}
:
AAARRGGHH!
I happened to encounter an instruction that looked unfamiliar. I started searching for it in the ARMv7-A/R ARM and...
...found a truckload of new instructions (fp + vector).
I had to regenerate the list all over again (updated in git). Next the spreadsheet.
I really hope I got them all this time. There seems to be 483 ARM instructions in the list, although some of them
are aliases. It's good I didn't spend much time with the Thumb instructions yet - They would have needed to be regenerated as well.
That's no fun at all. I hope you've got them all now.
When I wrote my disassembler/debugger, I recall how I went through each page of the book; actually I took all the integer instructions first.
When I was done, I had a break and worked on other things, then it struck me that I could make a floating point emulator, so I got the FFP library and added the entire FPU instruction set.
t probably took a few months before I had all instructions.
Hopefully the script will take some of the burden off.
... When you arrange the table entries and there are instructions where one has a known bit in one place and the other has a known bit in another place, it will be necessary to find out which states are valid and which are invalid. The one with invalid states should go after the one that does not have invalid states.
Eg. for instance this instruction is invalid:
add pc,pc,pc
-So ARM decided to recycle the opcode space (because there isn't a lot of opcode space left, so this is a good thing, though writing tools become more complex).
As the above instruction contains invalid combinations of registers (basically pc is not allowed in opcode2 I believe it is; but I might be wrong - it might be only when PC is the destination).
So the instruction which takes the seat from add pc,pc,pc, should go before the add instruction.
I think it might be a good idea to modify the script for the following checks:
1: is PC in opcode2.
2: is PC the destination register.
3: is SP in opcode2.
4: is SP the destination register.
I've forgotten other rules, but the above seem to be used a few times.
Also ... some bitfield instructions are not allowed.
Rule: BFI and BFC: Start+Length must be 32 or less.
Eg. BFI r3,r6,#23,#16 is illegal
When you reach the thumb instruction set and thumb2, it's important to read about the "restrictions" for each instruction.
The 16-bit thumb instructions only allow operations on r0...r7, except for very few instructions:
ADD r7,r7,r10 /* note: destination must be the same as operand1 (the opcode actually only has room for 2 registers) */
MOV r3,r11
CMP r2,r9
The rest of them do not allow operations on r8...r15, except for ADD and SUB with SP and PC (but that's a special case. ADD and SUB #imm also allow a different range on those two registers).
I don't know everything about the instruction sets, but I'll try and write whatever I remember.
I really feel like writing a disassembler, but unfortunately, I do not have the time. :/
Have to check "add pc,pc,pc", but basically add with PC as a destination is a special instruction:
The SUBS PC, LR, #<const> instruction provides an exception return without the use of the stack. It subtracts the immediate constant from LR, branches to the resulting address, and also copies the SPSR to the CPSR. ... Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7 <opc1>S<c> PC, <Rn>, <Rm>{, <shift>} <opc2>S<c> PC, <Rm>{, <shift>} <opc3>S<c> PC, <Rn>, #<const> RRXS<c> PC, <Rn> ... SUBS{<c>}{<q>} PC, LR, #<const> Encoding A1 <opc1>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A1 <opc1>S{<c>}{<q>} PC, <Rn>, <Rm> {, <shift>} Encoding A2, deprecated <opc2>S{<c>}{<q>} PC, #<const> Encoding A1, deprecated <opc2>S{<c>}{<q>} PC, <Rm> {, <shift>} Encoding A2 <opc3>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A2, deprecated RRXS{<c>}{<q>} PC, <Rn> Encoding A2, deprecated ... <opc1> The operation. <opc1> is one of ADC, ADD, AND, BIC, EOR, ORR, RSB, RSC, SBC, and SUB. ARM deprecates the use of all of these operations except SUB. <opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN. <opc3> The operation. <opc3> is ASR, LSL, LSR, or ROR. ARM deprecates the use of all of these operations.
The SUBS PC, LR, #<const> instruction provides an exception return without the use of the stack. It subtracts the
immediate constant from LR, branches to the resulting address, and also copies the SPSR to the CPSR.
...
Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7
<opc1>S<c> PC, <Rn>, <Rm>{, <shift>}
<opc2>S<c> PC, <Rm>{, <shift>}
<opc3>S<c> PC, <Rn>, #<const>
RRXS<c> PC, <Rn>
SUBS{<c>}{<q>} PC, LR, #<const> Encoding A1
<opc1>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A1
<opc1>S{<c>}{<q>} PC, <Rn>, <Rm> {, <shift>} Encoding A2, deprecated
<opc2>S{<c>}{<q>} PC, #<const> Encoding A1, deprecated
<opc2>S{<c>}{<q>} PC, <Rm> {, <shift>} Encoding A2
<opc3>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A2, deprecated
RRXS{<c>}{<q>} PC, <Rn> Encoding A2, deprecated
<opc1> The operation. <opc1> is one of ADC, ADD, AND, BIC, EOR, ORR, RSB, RSC, SBC, and SUB. ARM deprecates
the use of all of these operations except SUB.
<opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN.
<opc3> The operation. <opc3> is ASR, LSL, LSR, or ROR. ARM deprecates the use of all of these operations.
Also, I'm not sure if assembler lets instructions with lsb + length > 32 through.
Also, I'm allowing more than just user level. That drops quite some restrictions.
In the "assembly" group I got a hint that I also should do UNPREDICTABLEs, but warning about that would be nice.
That, I heard, is the convention with debuggers.
I'd like to explain a little better what I mean by the BFI and BFC:
If you come across an opcode, where the start + length is > 32, then it's not a BFI or BFC instruction.
That means that any opcodes with those values, must be handled before the BFI and BFC opcodes.
In other words: The mask+data for those opcodes must be preceding the mask+data for the BFI and BFC.
turboscrew wrote: These need to be recognized as not UNDEFINED.
turboscrew wrote:
These need to be recognized as not UNDEFINED.
If they're all valid, then just handle them before you check for UNDEFINED.
No, not again!
It'll take another day or two to figure these out!
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 0 1 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.337 1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.335 1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 0 f 1 1 N 1 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.371 1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 1 0 f 1 N 0 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm> T1/A1 A8.8.371 1 1 1 1 0 0 1 1 0 D f f n n n n d d d d 0 0 0 1 N Q M 1 m m m m V<op><c>_<Qd>,_<Qn>,_<Qm>_V<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.290 1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 0 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.281 1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m VP<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.366 1 1 1 1 0 0 1 1 1 D 1 1 n n n n d d d d 1 0 z z N f M 0 m m m m V<op><c>.8_<Dd>,_<list>,_<Dm> T1/A1 A8.8.419 1 1 1 1 0 0 1 Q 1 D z z n n n n d d d d 0 f 0 F N 1 M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Dm[x]>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm[x]> T1/A1 A8.8.338 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 0 f 0 N Q M 0 m m m m VH<op><c>_<Qd>,_<Qn>,_<Qm>_VH<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.319 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 1 1 0 N Q M f m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.334 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 1 0 1 0 N Q M f m m m m VP<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.365 1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 0 f 1 0 N 1 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.338 1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 1 0 f 0 N 0 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm> T2/A2 A8.8.336 1 1 1 1 0 0 1 f 0 D z z n n n n d d d d 1 0 0 1 N Q M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.336
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 0 1 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.337
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.335
1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 0 f 1 1 N 1 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.371
1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 1 0 f 1 N 0 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm> T1/A1 A8.8.371
1 1 1 1 0 0 1 1 0 D f f n n n n d d d d 0 0 0 1 N Q M 1 m m m m V<op><c>_<Qd>,_<Qn>,_<Qm>_V<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.290
1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 0 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.281
1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m VP<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.366
1 1 1 1 0 0 1 1 1 D 1 1 n n n n d d d d 1 0 z z N f M 0 m m m m V<op><c>.8_<Dd>,_<list>,_<Dm> T1/A1 A8.8.419
1 1 1 1 0 0 1 Q 1 D z z n n n n d d d d 0 f 0 F N 1 M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Dm[x]>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm[x]> T1/A1 A8.8.338
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 0 f 0 N Q M 0 m m m m VH<op><c>_<Qd>,_<Qn>,_<Qm>_VH<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.319
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 1 1 0 N Q M f m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.334
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 1 0 1 0 N Q M f m m m m VP<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.365
1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 0 f 1 0 N 1 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.338
1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 1 0 f 0 N 0 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm> T2/A2 A8.8.336
1 1 1 1 0 0 1 f 0 D z z n n n n d d d d 1 0 0 1 N Q M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.336
I'm very close of loosing my mind, and I'm surely getting very tired of fighting this.
I really had to do some work to find out the right 'MOV' in:
Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7 <opc1>S<c> PC, <Rn>, <Rm>{, <shift>} <opc2>S<c> PC, <Rm>{, <shift>} <opc3>S<c> PC, <Rn>, #<const> RRXS<c> PC, <Rn> <opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN.
c c c c 0 0 0 1 1 0 1 S (0) (0) (0) (0) 1 1 1 1 0 0 0 0 0 0 0 0 m m m m MOV{S}<c>_PC,_<Rm>_(=_LSL{S}<c>_PC,_<Rm>,_#0) A2 B9.3.20
(That is LSL (reg) with Rd=PC and immediate = 0).
[EDIT]
I frigging knew it (bolding is mine):
A8.8.290 VBIF, VBIT, VBSL Encoding T1/A1 Advanced SIMD V<op><c> <Qd>, <Qn>, <Qm> V<op><c> <Dd>, <Dn>, <Dm> if op == ‘00’ then SEE VEOR; if op == ‘01’ then operation = VBitOps_VBSL; if op == ‘10’ then operation = VBitOps_VBIT; if op == ‘11’ then operation = VBitOps_VBIF;
A8.8.290 VBIF, VBIT, VBSL
Encoding T1/A1 Advanced SIMD
V<op><c> <Qd>, <Qn>, <Qm>
V<op><c> <Dd>, <Dn>, <Dm>
if op == ‘00’ then SEE VEOR;
if op == ‘01’ then operation = VBitOps_VBSL;
if op == ‘10’ then operation = VBitOps_VBIT;
if op == ‘11’ then operation = VBitOps_VBIF;
and
A8.8.281 VACGE, VACGT, VACLE, VACLT Encoding T1/A1 Advanced SIMD (UNDEFINED in integer-only variant) V<op><c>.F32 <Qd>, <Qn>, <Qm> V<op><c>.F32 <Dd>, <Dn>, <Dm> Assembler syntax where: <op> The operation. It must be one of: ACGE Absolute Compare Greater than or Equal, encoded as op = 0. ACGT Absolute Compare Greater Than, encoded as op = 1.
A8.8.281 VACGE, VACGT, VACLE, VACLT
Encoding T1/A1 Advanced SIMD (UNDEFINED in integer-only variant)
V<op><c>.F32 <Qd>, <Qn>, <Qm>
V<op><c>.F32 <Dd>, <Dn>, <Dm>
Assembler syntax
where:
<op> The operation. It must be one of:
ACGE Absolute Compare Greater than or Equal, encoded as op = 0.
ACGT Absolute Compare Greater Than, encoded as op = 1.
What!
What happened to ACLE and ACLT?
What's the phone number of Sherlock Holmes?
Aha:
VACLE (Vector Absolute Compare Less Than or Equal) is a pseudo-instruction, equivalent to a VACGE instruction with the operands reversed. Disassembly produces the VACGE instruction. VACLT (Vector Absolute Compare Less Than) is a pseudo-instruction, equivalent to a VACGT instruction with the operands reversed. Disassembly produces the VACGT instruction.
VACLE (Vector Absolute Compare Less Than or Equal) is a pseudo-instruction, equivalent to a VACGE instruction with
the operands reversed. Disassembly produces the VACGE instruction.
VACLT (Vector Absolute Compare Less Than) is a pseudo-instruction, equivalent to a VACGT instruction with the
operands reversed. Disassembly produces the VACGT instruction.
[/EDIT]
Yep. All instructions not matching the table are considered UNDEFINED.
It means that all those 'new' instructions must be added to the table too. (Sigh.)
Oh well, it's just a couple of hundred instructions more...
When I wrote my disassembler, there were illegal instructions, which occupied parts of legal instruction space.
In some cases, I had to make a special "illegal instruction" handling; eg. place that before the actual decoded instruction.
I'm quite impressed with all your work. You've absolutely done a lot in very little time!
I just took a look at the ARM_instructions.txt ...
If ignoring the condition-codes, you got 8 bits, which are almost always known.
Speed-wise it might be a real good idea to do this:
index = 0xff & (opcode >> 20); /* isolate instruction group */
handleGroup[index](opcode); /* jump directly to group handler */
-That means you'll shave several clock cycles off your execution time, without really sacrificing anything.
In assembly language it could of course be just a simple jump-table; r0 = opcode:
handle_group:
ubfe r1,r0,#20,#8
tbb [r1,lsl#1]
table:
.4byte MUL_AND_Group
.4byte MLA_EOR_Group
.4byte UMAAL_SUB_Group
.4byte SUB_Group
.4byte MLS_RSB_Group
.4byte RSB_Group
A 256-entry table is fairly small on a RasPi
You can do this for "bits which are always known", but you can even extend it to include "bits which are often known"
Bits which are often known, could include bit 4 and perhaps bits 8...11; but it might be a good idea to wait determining what bits to include, till you have the complete table.
The above assembly code can then be declared as a function like this ...
void handle_group(uint32_t aOpcode);
and called that way; it'll indirectly jump to a C function, spending just a few clock cycles in total.
After that, you can probably focus on the low 20 bits, but in 16-bit thumb, there might be needs for modification, because 16-bit thumb does not have the 4-bit condition code field.
Unfortunately it's not so simple - this kind of things will mess it up causing "false positives":
The only bit that is either '0' or '1' (instruction specific) when the first 3 bits (27, 26, 25) after condition code field is 0 0 0, is bit 4.
If bit 4 = 0 then you can use the next bits (24, 23) but if bit 4 is '1', you have to check from bit 7 what are the next bits.
If bit 7 is '0', then next bits are 24 and 23, if bit 7 is '1', the next bits are 6 and 5.
And so on.
if the bits 27, 26 and 25 are 0 1 0, then there is only one instruction: single data transfer:
In the list it's listed like in the manual, but in the reality it's:
P = 1, pre-indexing, otherwise post-indexing or offset
U = 1 offset is added, otherwise offset is subtracted
B = 1 byte access, else word access
W = 1 writeback, else no writeback
L = 1 load, else store
It's a bit different with the media- or special LD/ST-instructions.
Sometimes all the above is not used, but are just part of opcode, sometimes B = 1 register, else immediate(?).
BTW, there will be another update to the ARM instruction list, and to the spreadsheet. I'll look into Thumbs not until I learn to do it better playing with ARM instructions first.
I think a good way might be:
check condition code
if it's 1 1 1 1 then specials
else normal
check bits 27 -25 (with both specials and normal)
then apply table if instruction subset contains several instructions
This way the huge amount of instructions is split into 16 subsets some of them only having a couple of instructions.
(NOTE: the floating point and vector instructions are in the special instructions - and there are lots of them.)
And if that's still far too much, I'll put them in a hash-table! That should, frigging, do it!
It would really be excellent, if ARM provided the instruction set as an XML file.
No matter which path you take, make sure you create some kind of automation script; it's tedious to do the actual code and table by hand.
Perl is very good for processing text-files (because of the excellent RegEx). It's currently my preferred choice, especially because you don't have to wait forever for it to compile.
I've been using awk, sed, sort and geany's regex + hand editing first. Then I load the file as CSV into LibreOffice Calc and do some editing there too. It's also tedious to go through the instructions (499 ARM instructions, I don't dare to think about Thumb instructions yet) and assign a handler to them - what handlers do I need and which handler to which instruction.
I'm not eager to crash-learn Perl at this point.
Oh, and I committed new versions of the text file and spreadsheet of the ARM instructions.
All the instructions are (I really hope) there.
The 16-bit thumb would be fairly short. I don't know how many instructions the 32-bit thumb provides.
There's a picture in this document, which gives you a quick overview: http://community.arm.com/docs/DOC-7034
Perl is very much C-like, but you don't have to learn it if the other tools can do what you need.
The hard part in Perl is probably the RegEx. The rest looks very much like C.
Generated 'raw' thumb instruction file the same way I created the ARM instruction file (all thumb instructions
should be there - both 16- and 32-bit) and it has 521 instructions.
And there are still 13 'V<op>'-kind of instructions. When they are expanded to real instructions, I guess 20 - 30 instructions more giving about 550 Thumb instructions.
Wish me luck and long life.
That's a lot more than I expected; but don't the thumb2 instructions share space with the instructions you already processed?
(I always had the impression that the Cortex-A7 was using the Thumb2 architecture, but I most likely need some correction here).
Oh, and ... of course: "Good Health, Strong Body, Clear Mind and Many Years".
I'm not sure what you mean by "...share space with the instructions you already processed".
They are re-using the instruction bits. You can't tell if it's ARM instruction or thumb instruction without checking the 'T'-bit in the CPSR.
VST1<c>.<size> <list>, [<Rn>{:<align>}]{!}
VST1<c>.<size> <list>, [<Rn>{:<align>}], <Rm>
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
(The site does something funny to the hand-formatted text. The editor seems to treat some stuff as tables - although misformatted.)
I think, for a bit pattern of the thumb instruction, there is an ARM instruction to match it that has nothing to do with the thumb instruction, and vice versa.
That is: a separate table is needed for Thumbs.
From the manual:
ARMv7 contains two main instruction sets, the ARM and Thumb instruction sets. The two instruction sets differ in how instructions are encoded: • Thumb instructions are either 16-bit or 32-bit, and are aligned on a two-byte boundary. 16-bit and 32-bit instructions can be intermixed freely.
ARMv7 contains two main instruction sets, the ARM and Thumb instruction sets.
The two instruction sets differ in how instructions are encoded:
• Thumb instructions are either 16-bit or 32-bit, and are aligned on a two-byte boundary. 16-bit and 32-bit
instructions can be intermixed freely.
Alright, you gave me excellent news today.
I did not expect the Cortex-A7 to support the ARM instruction set. I was only expecting it to support Thumb2.
So this day just got better.
... Yes, I know that there's a table-bug, but I've found out if paste the text into my text-editor (eg. text-only, no formatting), then re-copy and finally paste it into one of the JIVE-editors, it works better.
Each of the JIVE editors have several bugs. Some won't let the caret move past the # symbol, some will not let the caret move past an empty line, some does not recognize the Delete key, and some finds it amusing to remove a space now and then.
I know this has been reported to the authors, but I'm not sure they're able to fix it, so I've chosen to live with it and re-edit until my documents look the way I want them. Hey, we've got more than 80 characters per line.
And I've seen the train coming ... hey wait, it's some other light at the end of the tunnel...
It looks like the chapter 4 of the ARMv7-A/R ARM gives a good starting point for figuring out about the handler functions.
View all questions in Embedded forum