Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?
I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find
the next instruction(s) where the next breakpoint instruction needs to be set.
There are three cases:
1) current instruction doesn't change the execution path. Next instruction is the next word.
2) current instruction is a jump. The operand defines the next instruction address
3) current instruction is conditional branch. One possible next instruction is the next word, the other possible
instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).
To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.
I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,
or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.
Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.
This is truely a laborous project.
I calculated that there are 264 ARM instructions to go through, and the encoding is not very 'canonical'.
Then there seems to be 329 thumb instructions.
Also the "holes" in the encoding needs to be checked, because there are lots of "UNDEFINED" in them,
meaning that executing such instruction causes UND-exception.
Then the instruction encoding is often described like:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 0 0 op op1 op2
[EDIT]
The page seems to have been playing tricks...
'op' =bit 25, 'op1'=bits 24 - 20 and 'op2'=bits 7 - 4.
[/EDIT]
Table A5-2 shows the allocation of encodings in this space.
Table A5-2 Data-processing and miscellaneous instructions
op op1 op2 Instruction or instruction class Variant
0 not 10xx0 xxx0 Data-processing (register) on page A5-197 -
0xx1 Data-processing (register-shifted register) on page A5-198 -
10xx0 0xxx Miscellaneous instructions on page A5-207 -
1xx0 Halfword multiply and multiply accumulate on page A5-203 -
0xxxx 1001 Multiply and multiply accumulate on page A5-202 -
1xxxx 1001 Synchronization primitives on page A5-205 -
not 0xx1x 1011 Extra load/store instructions on page A5-203 -
11x1 Extra load/store instructions on page A5-203 -
0xx10 11x1 Extra load/store instructions on page A5-203 -
0xx1x 1011 Extra load/store instructions, unprivileged on page A5-204 -
0xx11 11x1 Extra load/store instructions, unprivileged on page A5-204 -
1 not 10xx0 - Data-processing (immediate) on page A5-199 -
10000 - 16-bit immediate load, MOV (immediate) on page A8-484 v6T2
10100 - High halfword 16-bit immediate load, MOVT on page A8-491 v6T2
10x10 - MSR (immediate), and hints on page A5-206 -
Really fun!
Meanwhile forging ahead forging a head.
What the heck is "unsigned saturation of a signed value" (USAT)?
Where does signed manipulation change to unsigned manipulation?
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored (result, sat) = UnsignedSatQ(SInt(operand), saturate_to); R[d] = ZeroExtend(result, 32); (bits(N), boolean) UnsignedSatQ(integer i, integer N) if i > 2^N - 1 then result = 2^N - 1; saturated = TRUE; elsif i < 0 then result = 0; saturated = TRUE; else result = i; saturated = FALSE; return (result<N-1:0>, saturated);
operand = Shift(R[n], shift_t, shift_n, APSR.C); // APSR.C ignored
(result, sat) = UnsignedSatQ(SInt(operand), saturate_to);
R[d] = ZeroExtend(result, 32);
(bits(N), boolean) UnsignedSatQ(integer i, integer N)
if i > 2^N - 1 then
result = 2^N - 1; saturated = TRUE;
elsif i < 0 then
result = 0; saturated = TRUE;
else
result = i; saturated = FALSE;
return (result<N-1:0>, saturated);
Is it just one-sided saturation / "rectification"?
I think I need to let the Cortex-A experts answer this question.
Maybe Alban knows where to find information on this ?
Any idea about instructions marked as UNPREDICTABLE: can it then be UNDEFINED?
In other words: UNDEFINED REQUIRES the instruction to cause UND-exception, but
MAY UNPREDICTABLE do that, or does it have to execute normally except that the result may be whatever?
(I'm not yelling even if I used caps.)
Outch. It's good that you found out, though!
The table way might be a good idea after all. At least in some instruction groups, like media instructions.
The encoding is that sparse and the holes cause UND-exception. They need to be decoded completely.
Thunder and blistering!
I may have to rewrite the media instruction decoding.
I just realized that my decoding wasn't complete and the non-existent instructions are UNDEFINED.
It's one thing if the target makes an UND-exception due to bad instruction but quite different thing
if the debugger single stepping makes an UND-exception trying to find out the next address.
I recall Atari ST was quite nice machine of it's time. 8088 wasn't that impressive either compared to any ARM.
I had to settle with Commodore 64 with the famous "washing machine processor" (6510).
turboscrew wrote: Reading/writing 400KByte/sec. What kind of reading/writing are you talking about?
turboscrew wrote:
Reading/writing 400KByte/sec.
What kind of reading/writing are you talking about?
On my Atari ST, I could reach 400KByte reading or writing per second as maximum (by using the movem.l instruction).
The main reason for this was most likely Atari's bus architecture.
Lucky me, any Cortex-M0, even if running only at 1 MHz, is faster.
I remember when I had plans to make a computer based on 68030.
The plan was trashed by the fact that those days home made address decoding would have been so slow that it wasn't worth the while. I considered v22 PLAs and some FPGAs, but the delays were far too big. With mask-programmed gate array it would have been a beast, but VERY expensive. I guess a mask cost like $1 000 000 back then.
68030 could do memory accesses in synchronous nibble mode in 55 ns, I recall.
(The dynamical bus sizing was, as such, really exciting idea.)
Ah, yes. things are coming back to me.
My debugger usually ran on a 68000, but my MegaSTE had a 68010, so I wrote an instruction emulator. It could emulate 68010, 20, 30, 40 and CPU32 instructions (the latter was never tested, though).
-Sometimes it's also a lot faster to simulate instruction execution.
Sadly, it was never released (and I don't have the sources here). I wrote it because MonST/MonTT could not debug a game I was participating in writing.
Yes, 68xxx was very convenient and easy in many ways; unfortunately, the performance wasn't so impresive. Reading/writing 400KByte/sec.
BTW, is your 68k debugger anywhere to be seen? I might like to take a look some time later when I have more time.
I like Motorola/Freescale 6.8k/68k assembly languages. So wonderfully symmetrical.
Table approach may not be very good with ARM, because the instruction defining bits are not in constant places. Not even mostly, except the 3 bits after condition code, and sometimes a special register value makes another instruction.
Also, in this program, I don't care about the instruction as such, but just the 'next address' after the instruction.
Sometimes it's easier to execute than to 'simulate' the code:
unsigned int check_msr_reg(uint32_t instr)
{
unsigned int new_pc = rpi2_reg_context.reg.r15;
// if user mode, then can't even guess
tmp1 = (uint32_t) rpi2_reg_context.reg.cpsr;
if ((tmp1 & 0x1f) == 0) // user mode
// UNPREDICTABLE - whether banked or not
// the bits 15 - 0 are UNKNOWN
new_pc = INSTR_ADDR_UNDEF;
}
// privileged mode - both reg and banked reg
tmp2 = (instr & 0xffff0fff) | (1 << 12); // edit Rd = r1
iptr = (uint32_t *) mrs_regb;
*iptr = tmp2;
asm(
"push {r0, r1}\n\t"
"mrs r1, cpsr @ save cpsr\n\t"
"push {r1}\n\t"
"ldr r0, =tmp1 @ set cpsr\n\t"
"msr cpsr, r0 @ note: user mode is already excluded\n\t"
"mrs_regb: .word 0 @ execute instr with our registers\n\t"
"ldr r0, =tmp2\n\t"
"str r1, [r0] @ store result to tmp2\n\t"
"pop {r1} @ restore cpsr\n\t"
"msr cpsr, r1\n\t"
"pop {r0, r1}\n\t"
);
new_pc = (unsigned int) tmp2;
return new_pc;
In this project this far I've learned about ARM (never really used before), awk (never used it before) and inline assembly (accessing C variables). For some unknown reason, I haven't been keen to use the inline assembly extensions though.
I once wrote a debugger for 68xxx (it was back in 1988 I think).
There I had a mask and data for each kind of instruction.
Eg.
if((instr & mask) == data){ found; }
Here you will need to sort the table, so that masking will work correctly; otherwise, if your first entry's mask is 0x0000, then you'll always get the same result.
Of course it is a good idea ending the table with mask 0x0000 and data 0x0000; this way you'll find the instruction fairly quickly.
You could make a similar table and add a 'type' entry. For instance type could be 'add' and then a positive/negative, in order to merge add and subtract into one handler (that might simplify and shorten the code).
Having handler types and a mode field would perhaps also make it possible to simplify the code.
Opcodes usually come in 'families'. It may be a good idea to find out which instruction family the opcode belongs to first, then handle the remaining part of the instruction decoding after that, however, it may be quicker to just use the mask and data system, depending on the number of instructions.
View all questions in Embedded forum