Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?
I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find
the next instruction(s) where the next breakpoint instruction needs to be set.
There are three cases:
1) current instruction doesn't change the execution path. Next instruction is the next word.
2) current instruction is a jump. The operand defines the next instruction address
3) current instruction is conditional branch. One possible next instruction is the next word, the other possible
instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).
To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.
I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,
or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.
Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.
I'm playing with Cortex A7, and I've been reading The ARMv7-AR Architecture Reference Manual (downloaded) and Cortex-A Series Programmer’s Guide (downloaded).
It's just that going through every instruction (both ARM and thumb) and figuring out the 'rules' to tell where to put the next breakpoint instruction(s) would take quite long time. First going through all instructions and picking those that could change the address where the next instruction is fetched. Then going through the picked instruction decodings and then finding out the 'common factors'. I'm quite new to ARM-world, and I don't remember the instructions. I have to look all of them up in the manuals.
BTW, jensbauer, you might remember me asking about a standalone gdb-stub. I decided to write it, and I have resume-from-breakpoint and single-stepping missing from my initial code. Most of the commands (for an almost-minimall stub), exception handling and serial I/O are written. When I get the missing parts cleared and coded, then I get to try running it and see where the smoke comes out.
Nice 'blinky'? (Except that it doesn't blink anything.)
bash-4.2$ wc *.[chS]
1201 3820 28965 gdb.c
18 33 244 gdb.h
28 62 443 io_dev.h
36 74 520 loader.c
396 1561 10754 rpi2.c
70 238 1659 rpi2.h
458 1605 10316 serial.c
29 71 569 serial.h
41 115 666 start.S
24 55 348 start1.c
228 730 4101 util.c
29 116 861 util.h
2558 8480 59446 total
Great work! I'm convinced that you'll get there soon.
It might be a good idea to put the instruction size in the table-entry I mentioned earlier.
You should probably also have in mind that Cortex-A can optionally be Big Endian. This does not mean that the instruction set changes, but I think it means that the load/store changes for 16/32/64 bit access.
-Thus if you load an instruction with a mask from memory, I believe you would be safe as long as you do not use two (or more) instructions to construct the 'immediate' values.
Eg. If you use MOV to load the value into a register and then use LDR to read from memory, you may get the value byte-reversed when using the LDR, but the value that MOV loaded would not be byte-reversed, thus there would be no match when you expect it to.
-So in this case, I believe loading from a table is the best solution.
I trust C-compiler knows its business.
But I probably have to use some kind of table. Depends on the complexity of the instruction handling needed.
Most C-compilers would store the 32-bit words in the literal pool, however, if turning on optimizing this part would break on Big Endian platforms.
I highly recommend the table. That would make things easier and also make sure it would work on any Endian machine.
To check at runtime whether your code is running on big or little endian, you could do this:
static const uint32_t isLittleEndian = 0x00000001;
if(*(const uint8_t *)&isLittleEndian)
{
// little
}
else
// big
On little endian, the low-byte would be first, on big endian, the lowbyte would be last, thus on big endian, the result would be 0, on little endian, the result would be 1.
-You could make it an inline function if necessary - or just a global.
I know. I once had to rewrite a endianness test for autotools (compilation-time) test, because we were cross-compiling and the default test needed to be run. Intel processor didn't run PowerPC code too well...
This is news to me, though:
"Most C-compilers would store the 32-bit words in the literal pool, however, if turning on optimizing this part would break on Big Endian platforms."
Actually, I better correct this. It only applies if you use the same binary on a multi-endian platform.
Imagine that your code is built for - say - Little Endian. It's now moved to a platform, where you do not know if the platform is running big or little endian. This can be switched by hardware; usually setting a pin high or low at boot.
Thus your program would need to determine the endianness at runtime.
If the masks and data constants are stored in table entries, they will match what you read from memory, no matter whether your load instruction swaps the data or not.
But MOVW and MOVT do not swap the data (here the data is fixed).
This means that the data would not be correct if your architecture's endianness does not match your binary. Thus you would need two binaries.
If you're running an operating system, such as Linux, then you would not have that problem, because it would most likely allow only loading .elf files where the endianness match the architecture.
Another caveat is when you load your data and bit-shift / mask out bits, you'll probably need to load byte-by-byte and insert the data into a 32-bit word and then do the comparison; again because the load instruction swaps the byte on loading from memory on one type of endianness compared to the other type of endianness.
A few days ago, I had to make some code, which I wrote for PowerPC (I'm still on PPC) work on an Intel-Mac, and I really got in trouble because it seems Apple changed the picture format for 32-bit ARGB (offscreen) pictures.
In addition, they made some changes to how caching works, and finally I had to fight against endian-problems on bit-shifting.
This was really a brain-twisting experience I won't recommend!
-I'm pretty convinced that there are still some bugs hidden in my code, because I ended up not knowing what I was doing, but it does work for now...
I think I have to try to get the thing together with a subset of the instructions first.
I've been going through the encoding of A1 (been doing it for a couple of days) and I still have quite some instructions to go through. There doesn't seem to be a single document that says it all - I've been reading 3 documents in parallel, and I still have done some guesswork too. The documents are "ARM® Cortex™-A Series, Version: 4.0, Programmer’s Guide",
"ARM® Architecture Reference Manual, ARMv7-A and ARMv7-R edition, Issue C.c" and "ARM Architecture Reference Manual, Issue I". Figuring out enough about all the instructions will take a couple of weeks still - probably longer than everything else together. Quite tiring and frustrating work.
Funny how some info seems to be dropped in updates. Like the bits PUNWL for LDC/LDC2/STC/STC2. I couldn't find the explanation in the ARMv7-A ARM anywhere, and the main instruction encoding table in ARM ARM could have been nice in ARMv7-A ARM too.
To not lose all I've done this far, I put my effort in the github. It compiles, but quite some code is still missing.
My struggle with the instructions is there in the file: instr.txt in case someone is interested.
The code is still "initial draft" so don't shoot me.
The repo is: turboscrew/rpi_stub · GitHub
It looks like figuring out the ARM ISA on bit level is becoming the most tedious and time consuming task.
When (if?)I get it figured out, I hope I still remember there was a project it was done for.
It doesn't help that aliases and pseudo instructions are treated in the document just like the 'native' instructions.
I think I just have to go through the instructions in the ARMv7-A ARM one by one and manually list all instructions and the bit patterns of all encodings in a text file for easier manipulation and sort them out there.
The HTML-pages are slow for that, and the copying works funny with PDFs.
The time estimate to finish the project just got fourfold (at least).
I know OpenOCD does single-stepping too. Perhaps this can be of some help to you ?
In the file cortex_a.c there is a breakpoint setting function and single stepping function, but they get the address as a parameter. I still haven't found where the address is decided, but I think it's somewhere there, because in the file /src/server/gdb_server.c the function fetch_packet implements the remote serial protocol - that's what I've been working on.
Looks very helpful. Thanks, jensbauer.
(there seems to be no 'helpful answer'-button, so I clicked 'correct answer' even if I'm not yet sure if this solves my problem. Odds look good though, so it can't go very wrong.
This is truely a laborous project.
I calculated that there are 264 ARM instructions to go through, and the encoding is not very 'canonical'.
Then there seems to be 329 thumb instructions.
Also the "holes" in the encoding needs to be checked, because there are lots of "UNDEFINED" in them,
meaning that executing such instruction causes UND-exception.
Then the instruction encoding is often described like:
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
cond 0 0 op op1 op2
[EDIT]
The page seems to have been playing tricks...
'op' =bit 25, 'op1'=bits 24 - 20 and 'op2'=bits 7 - 4.
[/EDIT]
Table A5-2 shows the allocation of encodings in this space.
Table A5-2 Data-processing and miscellaneous instructions
op op1 op2 Instruction or instruction class Variant
0 not 10xx0 xxx0 Data-processing (register) on page A5-197 -
0xx1 Data-processing (register-shifted register) on page A5-198 -
10xx0 0xxx Miscellaneous instructions on page A5-207 -
1xx0 Halfword multiply and multiply accumulate on page A5-203 -
0xxxx 1001 Multiply and multiply accumulate on page A5-202 -
1xxxx 1001 Synchronization primitives on page A5-205 -
not 0xx1x 1011 Extra load/store instructions on page A5-203 -
11x1 Extra load/store instructions on page A5-203 -
0xx10 11x1 Extra load/store instructions on page A5-203 -
0xx1x 1011 Extra load/store instructions, unprivileged on page A5-204 -
0xx11 11x1 Extra load/store instructions, unprivileged on page A5-204 -
1 not 10xx0 - Data-processing (immediate) on page A5-199 -
10000 - 16-bit immediate load, MOV (immediate) on page A8-484 v6T2
10100 - High halfword 16-bit immediate load, MOVT on page A8-491 v6T2
10x10 - MSR (immediate), and hints on page A5-206 -
Really fun!
I once made some code, which generated cycle-accurate code at run-time for execution.
Doing that on Cortex-M is not easy, because constants are chopped into small bits and pieces, plus they're moved around.
Generating code on-the-fly is a challenge, because creating those constants may take too long, compared to how long it takes to execute the code.
I would have expected that the MOVW instruction had a 16-bit immediate value that would go unmodified into the instruction, but it was not like that, thus I ended up with a different solution.
BTW: if you want to format your edits, you can click the "Use advanced editor" in the top right-hand corner. Then switch to the HTML editor and change the font to ...
Monaco, Courier, mono-space; after that, click "Use advanced editor" again, now the editor should allow you to make some fixed-font formatting.
It takes some "getting used to", but it's possible to do it; I've done it on each of my documents, Writing your own startup code for Cortex-M was the first.
I can understand. I couldn't avoid seeing the Thumb encodings while going through ARM encodings in the ARMv7-A/R ARM.
I'm sweating blood for just thinking about going through the Thumb instructions.
(There are still a lot of ARM instructions to do.)
BTW: if you want to format your edits, you can click the "Use advanced editor" in the top right-hand corner. Then switch to the HTML editor and change the font to ... Monaco, Courier, mono-space; after that, click "Use advanced editor" again, now the editor should allow you to make some fixed-font formatting.
Thanks for the hint.
(BTW, nice tutorial)
I wish you good wind on getting through the remaining instructions!
-Great that you got the fixed font working (you can also use 'view source' on one of my tutorials, to see what HTML code I've used, because in some cases it was necessary to do special edits; like the Monaco font, because it's not available from the menu).
There are a few other tutorials. I promised a series of tutorials on basic math on the Cortex-M: http://community.arm.com/docs/DOC-9653 .
Hopefully I can find some time to release the next part soon.
The idea is to provide something for both beginners and experienced programmers.
Beginners will get the basics, experienced programmers might be able to find some useful tips and tricks now and then.
You probably know the GNU Assembler well by now, but perhaps you can find something interesting in the article about the macros.
There seems to be a great deal of wind, but unfortunately foul wind.
(I just found another table according to which, there are groups of instructions in an area I thought I already finished.)
Your tutorials seem interesting. I have used GAS yes, but my very first ARM assembly code is in the github (this project) for everyone to see and have some laughs.
I just had a look, and I don't think it's bad.
You may want to add #4 to r4 and r6 when copying, because they're 32-bit pointers.
-This can be done automatically by postfixing the ldr/str instructions by ,#4 ...
ldr r7,[r4],#4
str r7,[r6],#4
cmp r4, r5
bmi loop$
-It should be slightly faster.