This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to know thumb code or arm code?

Hello Expert.!!

I have a only code in elf file .text section.

I want make a dis-assembler but i don't know which is thumb or arm code.

how to distinguish arm or thumb.   i can not read cpsr regester infomation. only have a code...

thanks

  • Hi 박주병,

    I guess that LSB of the arm function start addresses is 0 and LSB of the thumb function start addresses is 1 in an elf file.

    HTH,

    Yasuhiko Koumoto.

    For example:

    However,  I don't know how to distinguish 0x801c is the ARM code.
    a32.c

    int t1(int);
    int a1(int x)
    {
     return x+x;
    }
    int main(void)
    {
     return t1(10);
    

    t32.c

    int a1(int);
    int t1(int x)
    {
     return a1(x*x);
    }
    

    $ arm-none-eabi-gcc -c -O3 -mcpu=cortex-a7 a32.c
    $ arm-none-eabi-gcc -c -O3 -mcpu=cortex-a7 -mthumb t32.c
    $ arm-none-eabi-gcc -o ta32.out -mcpu=cortex-a7 a32.o t32.o -nostdlib
    $ arm-none-eabi-objdump -D ta32.out

    ta32.out:     file format elf32-littlearm

    Disassembly of section .text:

    00008000 <main>:
        8000:       e3a0000a        mov     r0, #10
        8004:       ea000005        b       8020 <__t1_from_arm>

    00008008 <a1>:
        8008:       e1a00080        lsl     r0, r0, #1
        800c:       e12fff1e        bx      lr

    00008010 <t1>:
        8010:       fb00 f000       mul.w   r0, r0, r0
        8014:       f000 b800       b.w     8018 <__a1_from_thumb>

    00008018 <__a1_from_thumb>:
        8018:       4778            bx      pc
        801a:       46c0            nop                     ; (mov r8, r8)
        801c:       eafffff9        b       8008 <a1>

    00008020 <__t1_from_arm>:
        8020:       e51ff004        ldr     pc, [pc, #-4]   ; 8024 <__t1_from_arm+0x4>
        8024:       00008011        andeq   r8, r0, r1, lsl r0
    ---[snip]---
    $ od -t x4 ta32.out
    0000000 464c457f 00010101 00000000 00000000
    0000020 00280002 00000001 00008000 00000034
    0000040 0000835c 05000200 00200034 00280001
    0000060 00040007 00000001 00000000 00000000
    0000100 00000000 00008028 00008028 00000005
    0000120 00010000 00000000 00000000 00000000
    0000140 00000000 00000000 00000000 00000000
    ---[snip]---
    0100320 00000003 00000000 00000000 00000000
    0100340 00000000 00000000 00008000 00000000
    0100360 00010003 00000000 00000000 00000000
    0100400 00020003 00000000 00000000 00000000
    0100420 00030003 00000001 00000000 00000000
    0100440 fff10004 00000007 00008008 00000000
    0100460 00010000 00000007 00008000 00000000
    0100500 00010000 0000000a 00000000 00000000
    0100520 fff10004 00000010 00008010 00000000
    0100540 00010000 00000013 00008019 00000008
    0100560 00010002 00000010 00008018 00000000
    0100600 00010000 00000007 0000801c 00000000
    0100620 00010000 00000023 00008020 00000008
    0100640 00010002 00000007 00008020 00000000
    0100660 00010000 00000031 00008024 00000000
    0100700 00010000 00000046 00018028 00000000
    0100720 00010010 00000034 00018028 00000000
    0100740 00010010 00000042 00008008 00000008
    0100760 00010012 00000045 00018028 00000000
    0101000 00010010 00000086 00000000 00000000
    0101020 00000010 00000051 00008011 00000008
    0101040 00010012 00000054 00018028 00000000
    0101060 00010010 00000060 00008000 00000008
    0101100 00010012 00000065 00018028 00000000
    0101120 00010010 0000006d 00018028 00000000
    0101140 00010010 00000074 00018028 00000000
    0101160 00010010 00000079 00080000 00000000
    0101200 00030010 00000080 00018028 00000000
    ---[snip]---

  • this is idea scan in libc..

    sometimes mixed arm code and thumb code.

    how to distinguish .. that....

    .text:0000DD8C 08 48                       LDR             R0, =(unk_4C540 - 0xDD96)

    .text:0000DD8E 09 49                       LDR             R1, =(sub_E02C+1 - 0xDD98)

    .text:0000DD90 08 B5                       PUSH            {R3,LR}

    .text:0000DD92 78 44                       ADD             R0, PC ; unk_4C540

    .text:0000DD94 79 44                       ADD             R1, PC ; sub_E02C

    .text:0000DD96 01 F0 92 E9             BLX             pthread_once

    .text:0000DD9A 40 B1                       CBZ             R0, locret_DDAE

    .text:0000DD9C 06 49                       LDR             R1, =(aMalloc_leak_ch - 0xDDA6)

    .text:0000DD9E 06 20                       MOVS            R0, #6

    .text:0000DDA0 06 4A                       LDR             R2, =(aUnableToInitia - 0xDDA8)

    .text:0000DDA2 79 44                       ADD             R1, PC  ; "malloc_leak_check"

    .text:0000DDA4 7A 44                       ADD             R2, PC  ; "Unable to initialize malloc_debug compo"...

    .text:0000DDA6 BD E8 08 40            POP.W           {R3,LR}

    .text:0000DDAA 04 F0 FF BD           B.W             sub_129AC

  • Hi 박주병,

    although I am not familiar with the elf format, I think there would be boundary addresses of ARM and Thumb codes in an elf header.

    As for my previous example, the boundaries would be 0x8000, 0x8009, 0x8010, 0x8018, 0x801c, 0x8020 and 0x8024. All addresses exists in the elf header.

    Best regards,

    Yasuhiko Koumoto.

  • This may be a bit late but other users might be interested in the info. I will list the steps that are needed to decode an ELF file in general (there may be some specific based on the processor arch/manufacturer).

    1. You need to decode the ELF file header to extract all the different headers and sections in your file. This is located at address 0 in the ELF file. There are a number of manuals online that describe the structure of the ELF header which is standard.

    2. The most important is the program header that contains the loadable program segment(s).

    3. The are a bunch of other information that you will need to extract such as start address of the heap, etc. These are only needed if you actually want to emulate your processor environment. For just debugging the code you can proceed directly to the loadable program segment(s).

    4. Once you get to the loadable program segment you can start decoding the instructions by reading 8-bit or 16-bit words at a time and deciphering the opcode (first byte). The opcode reference can be found on ARM website. Thumb2 (32-bit instructions within the Thumb ISA) instructions are assigned specific opcodes, for example in Thumb ISA opcodes 0x0F/0x1F/0x1E indicate that the instructions is a 32-bit instructions.

    If you haven't done this before it is quite a challenge. Alternatively, all these functions are included in the GCC source code. You can make use of them if you have the time to dig deep inside the GCC web.

    Good luck!