Hello Expert.!!
I have a only code in elf file .text section.
I want make a dis-assembler but i don't know which is thumb or arm code.
how to distinguish arm or thumb. i can not read cpsr regester infomation. only have a code...
thanks
Hi 박주병,
I guess that LSB of the arm function start addresses is 0 and LSB of the thumb function start addresses is 1 in an elf file.
HTH,
Yasuhiko Koumoto.
For example:
However, I don't know how to distinguish 0x801c is the ARM code.a32.c
int t1(int); int a1(int x) { return x+x; } int main(void) { return t1(10);
t32.c
int a1(int); int t1(int x) { return a1(x*x); }
$ arm-none-eabi-gcc -c -O3 -mcpu=cortex-a7 a32.c$ arm-none-eabi-gcc -c -O3 -mcpu=cortex-a7 -mthumb t32.c$ arm-none-eabi-gcc -o ta32.out -mcpu=cortex-a7 a32.o t32.o -nostdlib$ arm-none-eabi-objdump -D ta32.out
ta32.out: file format elf32-littlearm
Disassembly of section .text:
00008000 <main>: 8000: e3a0000a mov r0, #10 8004: ea000005 b 8020 <__t1_from_arm>
00008008 <a1>: 8008: e1a00080 lsl r0, r0, #1 800c: e12fff1e bx lr
00008010 <t1>: 8010: fb00 f000 mul.w r0, r0, r0 8014: f000 b800 b.w 8018 <__a1_from_thumb>
00008018 <__a1_from_thumb>: 8018: 4778 bx pc 801a: 46c0 nop ; (mov r8, r8) 801c: eafffff9 b 8008 <a1>
00008020 <__t1_from_arm>: 8020: e51ff004 ldr pc, [pc, #-4] ; 8024 <__t1_from_arm+0x4> 8024: 00008011 andeq r8, r0, r1, lsl r0---[snip]---$ od -t x4 ta32.out0000000 464c457f 00010101 00000000 000000000000020 00280002 00000001 00008000 000000340000040 0000835c 05000200 00200034 002800010000060 00040007 00000001 00000000 000000000000100 00000000 00008028 00008028 000000050000120 00010000 00000000 00000000 000000000000140 00000000 00000000 00000000 00000000---[snip]---0100320 00000003 00000000 00000000 000000000100340 00000000 00000000 00008000 000000000100360 00010003 00000000 00000000 000000000100400 00020003 00000000 00000000 000000000100420 00030003 00000001 00000000 000000000100440 fff10004 00000007 00008008 000000000100460 00010000 00000007 00008000 000000000100500 00010000 0000000a 00000000 000000000100520 fff10004 00000010 00008010 000000000100540 00010000 00000013 00008019 000000080100560 00010002 00000010 00008018 000000000100600 00010000 00000007 0000801c 000000000100620 00010000 00000023 00008020 000000080100640 00010002 00000007 00008020 000000000100660 00010000 00000031 00008024 000000000100700 00010000 00000046 00018028 000000000100720 00010010 00000034 00018028 000000000100740 00010010 00000042 00008008 000000080100760 00010012 00000045 00018028 000000000101000 00010010 00000086 00000000 000000000101020 00000010 00000051 00008011 000000080101040 00010012 00000054 00018028 000000000101060 00010010 00000060 00008000 000000080101100 00010012 00000065 00018028 000000000101120 00010010 0000006d 00018028 000000000101140 00010010 00000074 00018028 000000000101160 00010010 00000079 00080000 000000000101200 00030010 00000080 00018028 00000000---[snip]---
Aa ha thanks and i will check it.
this is idea scan in libc..
sometimes mixed arm code and thumb code.
how to distinguish .. that....
.text:0000DD8C 08 48 LDR R0, =(unk_4C540 - 0xDD96)
.text:0000DD8E 09 49 LDR R1, =(sub_E02C+1 - 0xDD98)
.text:0000DD90 08 B5 PUSH {R3,LR}
.text:0000DD92 78 44 ADD R0, PC ; unk_4C540
.text:0000DD94 79 44 ADD R1, PC ; sub_E02C
.text:0000DD96 01 F0 92 E9 BLX pthread_once
.text:0000DD9A 40 B1 CBZ R0, locret_DDAE
.text:0000DD9C 06 49 LDR R1, =(aMalloc_leak_ch - 0xDDA6)
.text:0000DD9E 06 20 MOVS R0, #6
.text:0000DDA0 06 4A LDR R2, =(aUnableToInitia - 0xDDA8)
.text:0000DDA2 79 44 ADD R1, PC ; "malloc_leak_check"
.text:0000DDA4 7A 44 ADD R2, PC ; "Unable to initialize malloc_debug compo"...
.text:0000DDA6 BD E8 08 40 POP.W {R3,LR}
.text:0000DDAA 04 F0 FF BD B.W sub_129AC
although I am not familiar with the elf format, I think there would be boundary addresses of ARM and Thumb codes in an elf header.
As for my previous example, the boundaries would be 0x8000, 0x8009, 0x8010, 0x8018, 0x801c, 0x8020 and 0x8024. All addresses exists in the elf header.
Best regards,
umm... so such as idea or objdump (dissassembler) may be parsing elf format and finding bx or blx instruction.
so guess these instruction arm or not.
and blanched address space thumb or arm... humm...
thanks for your opnion and i will more study for that.
This may be a bit late but other users might be interested in the info. I will list the steps that are needed to decode an ELF file in general (there may be some specific based on the processor arch/manufacturer).
1. You need to decode the ELF file header to extract all the different headers and sections in your file. This is located at address 0 in the ELF file. There are a number of manuals online that describe the structure of the ELF header which is standard.
2. The most important is the program header that contains the loadable program segment(s).
3. The are a bunch of other information that you will need to extract such as start address of the heap, etc. These are only needed if you actually want to emulate your processor environment. For just debugging the code you can proceed directly to the loadable program segment(s).
4. Once you get to the loadable program segment you can start decoding the instructions by reading 8-bit or 16-bit words at a time and deciphering the opcode (first byte). The opcode reference can be found on ARM website. Thumb2 (32-bit instructions within the Thumb ISA) instructions are assigned specific opcodes, for example in Thumb ISA opcodes 0x0F/0x1F/0x1E indicate that the instructions is a 32-bit instructions.
If you haven't done this before it is quite a challenge. Alternatively, all these functions are included in the GCC source code. You can make use of them if you have the time to dig deep inside the GCC web.
Good luck!