Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?
I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find
the next instruction(s) where the next breakpoint instruction needs to be set.
There are three cases:
1) current instruction doesn't change the execution path. Next instruction is the next word.
2) current instruction is a jump. The operand defines the next instruction address
3) current instruction is conditional branch. One possible next instruction is the next word, the other possible
instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).
To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.
I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,
or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.
Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.
Just to let possibly interested people to know:
I generated a list of encodings (both ARM and Thumb) for Cortex-A7.
The lists may still contain errors, like missing decodings or the like - they were awk-script generated from ARMv7-A/R ARM.
Since I didn't find that kind of lists in the net, I decided to make them.
The lists can be downloaded from https://github.com/turboscrew/rpi_stub (the files ARM_instructions.txt and Thumb_instructions.txt ).
They are text files to allow easier manipulation.
Good work, sir!
As far as I know, you're a pioneer in this area (eg. writing a bootloader debugger).
I was doing nicely until I had to start decoding that huge set of instructions.
Pretty laborous and slow.
Like the media instructions: I found that unsigned and signed parallel addition and subtractions instructions can be
handled similarly, like:
All patterns are similar except bit 22=1: unsigned, bits 21,20: 00=undef, 01=basic, 10=Q and 11=H.
That seems to be the same with all unsigned and signed parallel addition and subtractions instructions.
It's easier to see the patterns when you can just search and copy the instructions to another file (like the above)
and compare the bits of all the interesting instructions at once, and with patterns you can often unify the manipulation.
It would be nice to get the tables into excel or ods for even more flexible manipulation (moving columns around and sorting lines by some columns). Maybe one day.
[EDIT]
The web page somehow makes the lines into a table and wraps some cells.
It looks OK when writing/editing, but after posting it looks a bit weird.
[/EDIT]
I once wrote a debugger for 68xxx (it was back in 1988 I think).
There I had a mask and data for each kind of instruction.
Eg.
if((instr & mask) == data){ found; }
Here you will need to sort the table, so that masking will work correctly; otherwise, if your first entry's mask is 0x0000, then you'll always get the same result.
Of course it is a good idea ending the table with mask 0x0000 and data 0x0000; this way you'll find the instruction fairly quickly.
You could make a similar table and add a 'type' entry. For instance type could be 'add' and then a positive/negative, in order to merge add and subtract into one handler (that might simplify and shorten the code).
Having handler types and a mode field would perhaps also make it possible to simplify the code.
Opcodes usually come in 'families'. It may be a good idea to find out which instruction family the opcode belongs to first, then handle the remaining part of the instruction decoding after that, however, it may be quicker to just use the mask and data system, depending on the number of instructions.
Table approach may not be very good with ARM, because the instruction defining bits are not in constant places. Not even mostly, except the 3 bits after condition code, and sometimes a special register value makes another instruction.
Also, in this program, I don't care about the instruction as such, but just the 'next address' after the instruction.
Sometimes it's easier to execute than to 'simulate' the code:
unsigned int check_msr_reg(uint32_t instr)
{
unsigned int new_pc = rpi2_reg_context.reg.r15;
// if user mode, then can't even guess
tmp1 = (uint32_t) rpi2_reg_context.reg.cpsr;
if ((tmp1 & 0x1f) == 0) // user mode
// UNPREDICTABLE - whether banked or not
// the bits 15 - 0 are UNKNOWN
new_pc = INSTR_ADDR_UNDEF;
}
else
// privileged mode - both reg and banked reg
tmp2 = (instr & 0xffff0fff) | (1 << 12); // edit Rd = r1
iptr = (uint32_t *) mrs_regb;
*iptr = tmp2;
asm(
"push {r0, r1}\n\t"
"mrs r1, cpsr @ save cpsr\n\t"
"push {r1}\n\t"
"ldr r0, =tmp1 @ set cpsr\n\t"
"msr cpsr, r0 @ note: user mode is already excluded\n\t"
"mrs_regb: .word 0 @ execute instr with our registers\n\t"
"ldr r0, =tmp2\n\t"
"str r1, [r0] @ store result to tmp2\n\t"
"pop {r1} @ restore cpsr\n\t"
"msr cpsr, r1\n\t"
"pop {r0, r1}\n\t"
);
new_pc = (unsigned int) tmp2;
return new_pc;
In this project this far I've learned about ARM (never really used before), awk (never used it before) and inline assembly (accessing C variables). For some unknown reason, I haven't been keen to use the inline assembly extensions though.
BTW, is your 68k debugger anywhere to be seen? I might like to take a look some time later when I have more time.
I like Motorola/Freescale 6.8k/68k assembly languages. So wonderfully symmetrical.
Sadly, it was never released (and I don't have the sources here). I wrote it because MonST/MonTT could not debug a game I was participating in writing.
Yes, 68xxx was very convenient and easy in many ways; unfortunately, the performance wasn't so impresive. Reading/writing 400KByte/sec.
Ah, yes. things are coming back to me.
My debugger usually ran on a 68000, but my MegaSTE had a 68010, so I wrote an instruction emulator. It could emulate 68010, 20, 30, 40 and CPU32 instructions (the latter was never tested, though).
-Sometimes it's also a lot faster to simulate instruction execution.
Reading/writing 400KByte/sec.
What kind of reading/writing are you talking about?
I remember when I had plans to make a computer based on 68030.
The plan was trashed by the fact that those days home made address decoding would have been so slow that it wasn't worth the while. I considered v22 PLAs and some FPGAs, but the delays were far too big. With mask-programmed gate array it would have been a beast, but VERY expensive. I guess a mask cost like $1 000 000 back then.
68030 could do memory accesses in synchronous nibble mode in 55 ns, I recall.
(The dynamical bus sizing was, as such, really exciting idea.)
turboscrew wrote: Reading/writing 400KByte/sec. What kind of reading/writing are you talking about?
turboscrew wrote:
On my Atari ST, I could reach 400KByte reading or writing per second as maximum (by using the movem.l instruction).
The main reason for this was most likely Atari's bus architecture.
Lucky me, any Cortex-M0, even if running only at 1 MHz, is faster.
I recall Atari ST was quite nice machine of it's time. 8088 wasn't that impressive either compared to any ARM.
I had to settle with Commodore 64 with the famous "washing machine processor" (6510).
Thunder and blistering!
I may have to rewrite the media instruction decoding.
I just realized that my decoding wasn't complete and the non-existent instructions are UNDEFINED.
It's one thing if the target makes an UND-exception due to bad instruction but quite different thing
if the debugger single stepping makes an UND-exception trying to find out the next address.
Outch. It's good that you found out, though!
Any idea about instructions marked as UNPREDICTABLE: can it then be UNDEFINED?
In other words: UNDEFINED REQUIRES the instruction to cause UND-exception, but
MAY UNPREDICTABLE do that, or does it have to execute normally except that the result may be whatever?
(I'm not yelling even if I used caps.)
I think I need to let the Cortex-A experts answer this question.
Maybe Alban knows where to find information on this ?
Meanwhile forging ahead forging a head.
What the heck is "unsigned saturation of a signed value" (USAT)?
Where does signed manipulation change to unsigned manipulation?
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored (result, sat) = UnsignedSatQ(SInt(operand), saturate_to); R[d] = ZeroExtend(result, 32); (bits(N), boolean) UnsignedSatQ(integer i, integer N) if i > 2^N - 1 then result = 2^N - 1; saturated = TRUE; elsif i < 0 then result = 0; saturated = TRUE; else result = i; saturated = FALSE; return (result<N-1:0>, saturated);
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored
(result, sat) = UnsignedSatQ(SInt(operand), saturate_to);
R[d] = ZeroExtend(result, 32);
(bits(N), boolean) UnsignedSatQ(integer i, integer N)
if i > 2^N - 1 then
result = 2^N - 1; saturated = TRUE;
elsif i < 0 then
result = 0; saturated = TRUE;
result = i; saturated = FALSE;
return (result<N-1:0>, saturated);
Is it just one-sided saturation / "rectification"?
Got a very clear answer here: Re: UPREDICTABLE instructions .
UNPREDICTABLE may be implemented as UNDEFINED.
AAARRGGHH!
I happened to encounter an instruction that looked unfamiliar. I started searching for it in the ARMv7-A/R ARM and...
...found a truckload of new instructions (fp + vector).
I had to regenerate the list all over again (updated in git). Next the spreadsheet.
I really hope I got them all this time. There seems to be 483 ARM instructions in the list, although some of them
are aliases. It's good I didn't spend much time with the Thumb instructions yet - They would have needed to be regenerated as well.
That's no fun at all. I hope you've got them all now.
When I wrote my disassembler/debugger, I recall how I went through each page of the book; actually I took all the integer instructions first.
When I was done, I had a break and worked on other things, then it struck me that I could make a floating point emulator, so I got the FFP library and added the entire FPU instruction set.
t probably took a few months before I had all instructions.
Hopefully the script will take some of the burden off.
... When you arrange the table entries and there are instructions where one has a known bit in one place and the other has a known bit in another place, it will be necessary to find out which states are valid and which are invalid. The one with invalid states should go after the one that does not have invalid states.
Eg. for instance this instruction is invalid:
add pc,pc,pc
-So ARM decided to recycle the opcode space (because there isn't a lot of opcode space left, so this is a good thing, though writing tools become more complex).
As the above instruction contains invalid combinations of registers (basically pc is not allowed in opcode2 I believe it is; but I might be wrong - it might be only when PC is the destination).
So the instruction which takes the seat from add pc,pc,pc, should go before the add instruction.
I think it might be a good idea to modify the script for the following checks:
1: is PC in opcode2.
2: is PC the destination register.
3: is SP in opcode2.
4: is SP the destination register.
I've forgotten other rules, but the above seem to be used a few times.
Also ... some bitfield instructions are not allowed.
Rule: BFI and BFC: Start+Length must be 32 or less.
Eg. BFI r3,r6,#23,#16 is illegal
When you reach the thumb instruction set and thumb2, it's important to read about the "restrictions" for each instruction.
The 16-bit thumb instructions only allow operations on r0...r7, except for very few instructions:
ADD r7,r7,r10 /* note: destination must be the same as operand1 (the opcode actually only has room for 2 registers) */
MOV r3,r11
CMP r2,r9
The rest of them do not allow operations on r8...r15, except for ADD and SUB with SP and PC (but that's a special case. ADD and SUB #imm also allow a different range on those two registers).
I don't know everything about the instruction sets, but I'll try and write whatever I remember.
I really feel like writing a disassembler, but unfortunately, I do not have the time. :/
Have to check "add pc,pc,pc", but basically add with PC as a destination is a special instruction:
The SUBS PC, LR, #<const> instruction provides an exception return without the use of the stack. It subtracts the immediate constant from LR, branches to the resulting address, and also copies the SPSR to the CPSR. ... Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7 <opc1>S<c> PC, <Rn>, <Rm>{, <shift>} <opc2>S<c> PC, <Rm>{, <shift>} <opc3>S<c> PC, <Rn>, #<const> RRXS<c> PC, <Rn> ... SUBS{<c>}{<q>} PC, LR, #<const> Encoding A1 <opc1>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A1 <opc1>S{<c>}{<q>} PC, <Rn>, <Rm> {, <shift>} Encoding A2, deprecated <opc2>S{<c>}{<q>} PC, #<const> Encoding A1, deprecated <opc2>S{<c>}{<q>} PC, <Rm> {, <shift>} Encoding A2 <opc3>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A2, deprecated RRXS{<c>}{<q>} PC, <Rn> Encoding A2, deprecated ... <opc1> The operation. <opc1> is one of ADC, ADD, AND, BIC, EOR, ORR, RSB, RSC, SBC, and SUB. ARM deprecates the use of all of these operations except SUB. <opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN. <opc3> The operation. <opc3> is ASR, LSL, LSR, or ROR. ARM deprecates the use of all of these operations.
The SUBS PC, LR, #<const> instruction provides an exception return without the use of the stack. It subtracts the
immediate constant from LR, branches to the resulting address, and also copies the SPSR to the CPSR.
...
Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7
<opc1>S<c> PC, <Rn>, <Rm>{, <shift>}
<opc2>S<c> PC, <Rm>{, <shift>}
<opc3>S<c> PC, <Rn>, #<const>
RRXS<c> PC, <Rn>
SUBS{<c>}{<q>} PC, LR, #<const> Encoding A1
<opc1>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A1
<opc1>S{<c>}{<q>} PC, <Rn>, <Rm> {, <shift>} Encoding A2, deprecated
<opc2>S{<c>}{<q>} PC, #<const> Encoding A1, deprecated
<opc2>S{<c>}{<q>} PC, <Rm> {, <shift>} Encoding A2
<opc3>S{<c>}{<q>} PC, <Rn>, #<const> Encoding A2, deprecated
RRXS{<c>}{<q>} PC, <Rn> Encoding A2, deprecated
<opc1> The operation. <opc1> is one of ADC, ADD, AND, BIC, EOR, ORR, RSB, RSC, SBC, and SUB. ARM deprecates
the use of all of these operations except SUB.
<opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN.
<opc3> The operation. <opc3> is ASR, LSL, LSR, or ROR. ARM deprecates the use of all of these operations.
Also, I'm not sure if assembler lets instructions with lsb + length > 32 through.
Also, I'm allowing more than just user level. That drops quite some restrictions.
In the "assembly" group I got a hint that I also should do UNPREDICTABLEs, but warning about that would be nice.
That, I heard, is the convention with debuggers.
I'd like to explain a little better what I mean by the BFI and BFC:
If you come across an opcode, where the start + length is > 32, then it's not a BFI or BFC instruction.
That means that any opcodes with those values, must be handled before the BFI and BFC opcodes.
In other words: The mask+data for those opcodes must be preceding the mask+data for the BFI and BFC.
turboscrew wrote: These need to be recognized as not UNDEFINED.
These need to be recognized as not UNDEFINED.
If they're all valid, then just handle them before you check for UNDEFINED.
No, not again!
It'll take another day or two to figure these out!
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 0 1 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.337 1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.335 1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 0 f 1 1 N 1 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.371 1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 1 0 f 1 N 0 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm> T1/A1 A8.8.371 1 1 1 1 0 0 1 1 0 D f f n n n n d d d d 0 0 0 1 N Q M 1 m m m m V<op><c>_<Qd>,_<Qn>,_<Qm>_V<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.290 1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 0 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.281 1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m VP<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.366 1 1 1 1 0 0 1 1 1 D 1 1 n n n n d d d d 1 0 z z N f M 0 m m m m V<op><c>.8_<Dd>,_<list>,_<Dm> T1/A1 A8.8.419 1 1 1 1 0 0 1 Q 1 D z z n n n n d d d d 0 f 0 F N 1 M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Dm[x]>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm[x]> T1/A1 A8.8.338 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 0 f 0 N Q M 0 m m m m VH<op><c>_<Qd>,_<Qn>,_<Qm>_VH<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.319 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 1 1 0 N Q M f m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.334 1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 1 0 1 0 N Q M f m m m m VP<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.365 1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 0 f 1 0 N 1 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.338 1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 1 0 f 0 N 0 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm> T2/A2 A8.8.336 1 1 1 1 0 0 1 f 0 D z z n n n n d d d d 1 0 0 1 N Q M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.336
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 0 1 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.337
1 1 1 1 0 0 1 0 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.335
1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 0 f 1 1 N 1 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.371
1 1 1 1 0 0 1 0 1 D z z n n n n d d d d 1 0 f 1 N 0 M 0 m m m m VQD<op><c>.<dt>_<Qd>,_<Dn>,_<Dm> T1/A1 A8.8.371
1 1 1 1 0 0 1 1 0 D f f n n n n d d d d 0 0 0 1 N Q M 1 m m m m V<op><c>_<Qd>,_<Qn>,_<Qm>_V<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.290
1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 0 N Q M 1 m m m m V<op><c>.F32_<Qd>,_<Qn>,_<Qm>_V<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.281
1 1 1 1 0 0 1 1 0 D f z n n n n d d d d 1 1 1 1 N Q M 0 m m m m VP<op><c>.F32_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.366
1 1 1 1 0 0 1 1 1 D 1 1 n n n n d d d d 1 0 z z N f M 0 m m m m V<op><c>.8_<Dd>,_<list>,_<Dm> T1/A1 A8.8.419
1 1 1 1 0 0 1 Q 1 D z z n n n n d d d d 0 f 0 F N 1 M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Dm[x]>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm[x]> T1/A1 A8.8.338
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 0 f 0 N Q M 0 m m m m VH<op><c>_<Qd>,_<Qn>,_<Qm>_VH<op><c>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.319
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 0 1 1 0 N Q M f m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.334
1 1 1 1 0 0 1 U 0 D z z n n n n d d d d 1 0 1 0 N Q M f m m m m VP<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.365
1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 0 f 1 0 N 1 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm[x]> T2/A2 A8.8.338
1 1 1 1 0 0 1 U 1 D z z n n n n d d d d 1 0 f 0 N 0 M 0 m m m m V<op>L<c>.<dt>_<Qd>,_<Dn>,_<Dm> T2/A2 A8.8.336
1 1 1 1 0 0 1 f 0 D z z n n n n d d d d 1 0 0 1 N Q M 0 m m m m V<op><c>.<dt>_<Qd>,_<Qn>,_<Qm>_V<op><c>.<dt>_<Dd>,_<Dn>,_<Dm> T1/A1 A8.8.336
I'm very close of loosing my mind, and I'm surely getting very tired of fighting this.
I really had to do some work to find out the right 'MOV' in:
Encoding A2 ARMv4*, ARMv5T*, ARMv6*, ARMv7 <opc1>S<c> PC, <Rn>, <Rm>{, <shift>} <opc2>S<c> PC, <Rm>{, <shift>} <opc3>S<c> PC, <Rn>, #<const> RRXS<c> PC, <Rn> <opc2> The operation. <opc2> is MOV or MVN. ARM deprecates the use of MVN.
c c c c 0 0 0 1 1 0 1 S (0) (0) (0) (0) 1 1 1 1 0 0 0 0 0 0 0 0 m m m m MOV{S}<c>_PC,_<Rm>_(=_LSL{S}<c>_PC,_<Rm>,_#0) A2 B9.3.20
(That is LSL (reg) with Rd=PC and immediate = 0).
I frigging knew it (bolding is mine):
A8.8.290 VBIF, VBIT, VBSL Encoding T1/A1 Advanced SIMD V<op><c> <Qd>, <Qn>, <Qm> V<op><c> <Dd>, <Dn>, <Dm> if op == ‘00’ then SEE VEOR; if op == ‘01’ then operation = VBitOps_VBSL; if op == ‘10’ then operation = VBitOps_VBIT; if op == ‘11’ then operation = VBitOps_VBIF;
A8.8.290 VBIF, VBIT, VBSL
Encoding T1/A1 Advanced SIMD
V<op><c> <Qd>, <Qn>, <Qm>
V<op><c> <Dd>, <Dn>, <Dm>
if op == ‘00’ then SEE VEOR;
if op == ‘01’ then operation = VBitOps_VBSL;
if op == ‘10’ then operation = VBitOps_VBIT;
if op == ‘11’ then operation = VBitOps_VBIF;
and
A8.8.281 VACGE, VACGT, VACLE, VACLT Encoding T1/A1 Advanced SIMD (UNDEFINED in integer-only variant) V<op><c>.F32 <Qd>, <Qn>, <Qm> V<op><c>.F32 <Dd>, <Dn>, <Dm> Assembler syntax where: <op> The operation. It must be one of: ACGE Absolute Compare Greater than or Equal, encoded as op = 0. ACGT Absolute Compare Greater Than, encoded as op = 1.
A8.8.281 VACGE, VACGT, VACLE, VACLT
Encoding T1/A1 Advanced SIMD (UNDEFINED in integer-only variant)
V<op><c>.F32 <Qd>, <Qn>, <Qm>
V<op><c>.F32 <Dd>, <Dn>, <Dm>
Assembler syntax
where:
<op> The operation. It must be one of:
ACGE Absolute Compare Greater than or Equal, encoded as op = 0.
ACGT Absolute Compare Greater Than, encoded as op = 1.
What!
What happened to ACLE and ACLT?
What's the phone number of Sherlock Holmes?
Aha:
VACLE (Vector Absolute Compare Less Than or Equal) is a pseudo-instruction, equivalent to a VACGE instruction with the operands reversed. Disassembly produces the VACGE instruction. VACLT (Vector Absolute Compare Less Than) is a pseudo-instruction, equivalent to a VACGT instruction with the operands reversed. Disassembly produces the VACGT instruction.
VACLE (Vector Absolute Compare Less Than or Equal) is a pseudo-instruction, equivalent to a VACGE instruction with
the operands reversed. Disassembly produces the VACGE instruction.
VACLT (Vector Absolute Compare Less Than) is a pseudo-instruction, equivalent to a VACGT instruction with the
operands reversed. Disassembly produces the VACGT instruction.
Yep. All instructions not matching the table are considered UNDEFINED.
It means that all those 'new' instructions must be added to the table too. (Sigh.)
Oh well, it's just a couple of hundred instructions more...
When I wrote my disassembler, there were illegal instructions, which occupied parts of legal instruction space.
In some cases, I had to make a special "illegal instruction" handling; eg. place that before the actual decoded instruction.
I'm quite impressed with all your work. You've absolutely done a lot in very little time!
I just took a look at the ARM_instructions.txt ...
If ignoring the condition-codes, you got 8 bits, which are almost always known.
Speed-wise it might be a real good idea to do this:
index = 0xff & (opcode >> 20); /* isolate instruction group */
handleGroup[index](opcode); /* jump directly to group handler */
-That means you'll shave several clock cycles off your execution time, without really sacrificing anything.
In assembly language it could of course be just a simple jump-table; r0 = opcode:
handle_group:
ubfe r1,r0,#20,#8
tbb [r1,lsl#1]
table:
.4byte MUL_AND_Group
.4byte MLA_EOR_Group
.4byte UMAAL_SUB_Group
.4byte SUB_Group
.4byte MLS_RSB_Group
.4byte RSB_Group
A 256-entry table is fairly small on a RasPi
You can do this for "bits which are always known", but you can even extend it to include "bits which are often known"
Bits which are often known, could include bit 4 and perhaps bits 8...11; but it might be a good idea to wait determining what bits to include, till you have the complete table.
The above assembly code can then be declared as a function like this ...
void handle_group(uint32_t aOpcode);
and called that way; it'll indirectly jump to a C function, spending just a few clock cycles in total.
After that, you can probably focus on the low 20 bits, but in 16-bit thumb, there might be needs for modification, because 16-bit thumb does not have the 4-bit condition code field.
Unfortunately it's not so simple - this kind of things will mess it up causing "false positives":
The only bit that is either '0' or '1' (instruction specific) when the first 3 bits (27, 26, 25) after condition code field is 0 0 0, is bit 4.
If bit 4 = 0 then you can use the next bits (24, 23) but if bit 4 is '1', you have to check from bit 7 what are the next bits.
If bit 7 is '0', then next bits are 24 and 23, if bit 7 is '1', the next bits are 6 and 5.
And so on.
if the bits 27, 26 and 25 are 0 1 0, then there is only one instruction: single data transfer:
In the list it's listed like in the manual, but in the reality it's:
P = 1, pre-indexing, otherwise post-indexing or offset
U = 1 offset is added, otherwise offset is subtracted
B = 1 byte access, else word access
W = 1 writeback, else no writeback
L = 1 load, else store
It's a bit different with the media- or special LD/ST-instructions.
Sometimes all the above is not used, but are just part of opcode, sometimes B = 1 register, else immediate(?).
BTW, there will be another update to the ARM instruction list, and to the spreadsheet. I'll look into Thumbs not until I learn to do it better playing with ARM instructions first.
I think a good way might be:
check condition code
if it's 1 1 1 1 then specials
else normal
check bits 27 -25 (with both specials and normal)
then apply table if instruction subset contains several instructions
This way the huge amount of instructions is split into 16 subsets some of them only having a couple of instructions.
(NOTE: the floating point and vector instructions are in the special instructions - and there are lots of them.)
And if that's still far too much, I'll put them in a hash-table! That should, frigging, do it!
View all questions in Embedded forum