ARM/THUMB instructions that change execution path?

Has anybody come across a list of ARM & THUMB instructions that cause deviation from the linear instruction stream?

I've been trying to figure out gdb-stub single stepping using software interrupts, and in single stepping you need to find

the next instruction(s) where the next breakpoint instruction needs to be set.

There are three cases:

1) current instruction doesn't change the execution path. Next instruction is the next word.

2) current instruction is a jump. The operand defines the next instruction address

3) current instruction is conditional branch. One possible next instruction is the next word, the other possible

instruction address is defined by the operand. (That includes conditional add with PC as the target, and the like).

To implement single stepping, I need to tell those cases apart and figure out how to find out the possible branching address.

I could go through manuals of numerous processors instruction by instruction and maybe I'd be done within the next couple of years,

or I could find a list of instructions to check, or a paper that explains how to "decode" the instructions in a useful way.

Also, there doesn't seem to be lots of sources of ARM gdb servers or stubs around that use software breakpoints.

Parents

0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

I once wrote a debugger for 68xxx (it was back in 1988 I think).
There I had a mask and data for each kind of instruction.
Eg.
if((instr & mask) == data){ found; }
Here you will need to sort the table, so that masking will work correctly; otherwise, if your first entry's mask is 0x0000, then you'll always get the same result.
Of course it is a good idea ending the table with mask 0x0000 and data 0x0000; this way you'll find the instruction fairly quickly.
You could make a similar table and add a 'type' entry. For instance type could be 'add' and then a positive/negative, in order to merge add and subtract into one handler (that might simplify and shorten the code).
Having handler types and a mode field would perhaps also make it possible to simplify the code.
Opcodes usually come in 'families'. It may be a good idea to find out which instruction family the opcode belongs to first, then handle the remaining part of the instruction decoding after that, however, it may be quicker to just use the mask and data system, depending on the number of instructions.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

Reply

0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

I once wrote a debugger for 68xxx (it was back in 1988 I think).
There I had a mask and data for each kind of instruction.
Eg.
if((instr & mask) == data){ found; }
Here you will need to sort the table, so that masking will work correctly; otherwise, if your first entry's mask is 0x0000, then you'll always get the same result.
Of course it is a good idea ending the table with mask 0x0000 and data 0x0000; this way you'll find the instruction fairly quickly.
You could make a similar table and add a 'type' entry. For instance type could be 'add' and then a positive/negative, in order to merge add and subtract into one handler (that might simplify and shorten the code).
Having handler types and a mode field would perhaps also make it possible to simplify the code.
Opcodes usually come in 'families'. It may be a good idea to find out which instruction family the opcode belongs to first, then handle the remaining part of the instruction decoding after that, however, it may be quicker to just use the mask and data system, depending on the number of instructions.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

Children

0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

Table approach may not be very good with ARM, because the instruction defining bits are not in constant places. Not even mostly, except the 3 bits after condition code, and sometimes a special register value makes another instruction.
Also, in this program, I don't care about the instruction as such, but just the 'next address' after the instruction.
Sometimes it's easier to execute than to 'simulate' the code:
unsigned int check_msr_reg(uint32_t instr)
{
    unsigned int new_pc = rpi2_reg_context.reg.r15;
    // if user mode, then can't even guess
    tmp1 = (uint32_t) rpi2_reg_context.reg.cpsr;
    if ((tmp1 & 0x1f) == 0) // user mode
    {
        // UNPREDICTABLE - whether banked or not
        // the bits 15 - 0 are UNKNOWN
        new_pc = INSTR_ADDR_UNDEF;
    }
    else
    {
        // privileged mode - both reg and banked reg
        tmp2 = (instr & 0xffff0fff) | (1 << 12); // edit Rd = r1
        iptr = (uint32_t *) mrs_regb;
        *iptr = tmp2;
        asm(
            "push {r0, r1}\n\t"
            "mrs r1, cpsr @ save cpsr\n\t"
            "push {r1}\n\t"
            "ldr r0, =tmp1 @ set cpsr\n\t"
            "msr cpsr, r0 @ note: user mode is already excluded\n\t"
            "mrs_regb: .word 0 @ execute instr with our registers\n\t"
            "ldr r0, =tmp2\n\t"
            "str r1, [r0] @ store result to tmp2\n\t"
            "pop {r1} @ restore cpsr\n\t"
            "msr cpsr, r1\n\t"
            "pop {r0, r1}\n\t"
            );
        new_pc = (unsigned int) tmp2;
    }
    return new_pc;
}
In this project this far I've learned about ARM (never really used before), awk (never used it before) and inline assembly (accessing C variables). For some unknown reason, I haven't been keen to use the inline assembly extensions though.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

BTW, is your 68k debugger anywhere to be seen? I might like to take a look some time later when I have more time.
I like Motorola/Freescale 6.8k/68k assembly languages. So wonderfully symmetrical.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

Sadly, it was never released (and I don't have the sources here). I wrote it because MonST/MonTT could not debug a game I was participating in writing.
Yes, 68xxx was very convenient and easy in many ways; unfortunately, the performance wasn't so impresive. Reading/writing 400KByte/sec.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

Ah, yes. things are coming back to me.
My debugger usually ran on a 68000, but my MegaSTE had a 68010, so I wrote an instruction emulator. It could emulate 68010, 20, 30, 40 and CPU32 instructions (the latter was never tested, though).
-Sometimes it's also a lot faster to simulate instruction execution.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

Reading/writing 400KByte/sec.

What kind of reading/writing are you talking about?
I remember when I had plans to make a computer based on 68030.
The plan was trashed by the fact that those days home made address decoding would have been so slow that it wasn't worth the while. I considered v22 PLAs and some FPGAs, but the delays were far too big. With mask-programmed gate array it would have been a beast, but VERY expensive. I guess a mask cost like $1 000 000 back then.
68030 could do memory accesses in synchronous nibble mode in 55 ns, I recall.
(The dynamical bus sizing was, as such, really exciting idea.)
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

turboscrew wrote:

Reading/writing 400KByte/sec.

What kind of reading/writing are you talking about?

On my Atari ST, I could reach 400KByte reading or writing per second as maximum (by using the movem.l instruction).
The main reason for this was most likely Atari's bus architecture.
Lucky me, any Cortex-M0, even if running only at 1 MHz, is faster.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

I recall Atari ST was quite nice machine of it's time. 8088 wasn't that impressive either compared to any ARM.
I had to settle with Commodore 64 with the famous "washing machine processor" (6510).
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

The table way might be a good idea after all. At least in some instruction groups, like media instructions.
The encoding is that sparse and the holes cause UND-exception. They need to be decoded completely.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

0 Juha Aaltonen over 9 years ago in reply to Juha Aaltonen

Now that I have the instruction tables...

I don't think the table-method works very well. The "significant bits" seem to move around all the time:

(With "significant bits" I mean bit position that have instruction specific '1' or '0' through the whole group of instuction encodings the instruction is checked against. In this "group" the "significant bits are 27, 26, 25 and 22. In reality, bit 22 is really not "significant, but B-bit indicating byte vs. word access. STR, LDR, STRB and LDRB are really aliases of one and the same instruction)

SBFX<c> <Rd>,<Rn>,#<lsb>,#<width>

A8.8.164

BFC<c> <Rd>,#<lsb>,#<width>

A8.8.19

BFI<c> <Rd>,<Rn>,#<lsb>,#<width>

A8.8.20

UBFX<c> <Rd>,<Rn>,#<lsb>,#<width>

A8.8.246

STR<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} STR<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>}

A8.8.205

LDR<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} LDR<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>}

A8.8.66

STRB<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} STRB<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>}

A8.8.208

LDRB<c> <Rt>,[<Rn>,+/-<Rm>{,<sift>}]{!} LDRB<c> <Rt>,[<Rn>],+/-<Rm>{,<sift>}

A8.8.70

STMDA<c> <Rn>{!},<registers>

A8.8.200

LDMDA<c> <Rn>{!},<registers>

A8.8.59

POP<c> <registers>

A8.8.132

0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

I'm happy to hear that the table method will work.
Normally, instruction groups are very logical.
Yes, you could say that the LDR/STR instruction is just a 'transfer instruction between memory and registers', where there's a flag indicating the direction.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

Well, the table method doesn't work, I saw that when I got the spreadsheet about the instructions done.

When you look at the sheet, you can see that with all the instructions only bits 27, 26 and 25 are significant (= not belonging to an operand or modifier of some instruction). When you take a group of instructions that have bits 27, 26 and 25 all zeros, the only significant bit is bit 4. With instruction groups that have bits 27 - 25 and 4 zero, the new significant bits are 24, 23 and 21. If bit 4 is one, the only new significant bit is bit 7 and so on.

Makes one h*** of a decoding logic.

The spreadsheet is pretty nice tool for something like this: you can sort the lines (instructions) by different combinations of columns to find out which decoding steps work.

I got the arithmetic & logic instructions nicely organized by selecting the lines and organizing them as bit 25 as primary, bit 4 as secondary and bit 21 as tertiary sorting "key". The instructions become sorted by addressing modes.

First all register operation, then register shifted register and last the immediates. You can't apply all the keys at the same time, unfortunately, but one by one.

This is what it looks like:

AND{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.14

SUB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.223

ADD{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.7

SBC{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.162

ORR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.123

Sorting with bits 24 - 21 in that order gives:

AND{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.14

AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.15

AND{S}<c> <Rd>,<Rn>,#<const>

A8.8.13

EOR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.47

EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.48

EOR{S}<c> <Rd>,<Rn>,#<const>

A8.8.46

SUB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.223

SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.224

SUB{S}<c> <Rd>,<Rn>,#<const>

A8.8.222

RSB{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.153

0 Jens Bauer over 9 years ago in reply to Juha Aaltonen
I still think it's possible to use the table, especially, if you're only decoding 32-bit instructions.
Let's make a few rules:
1: If a bit is known to be either 1 or 0, set the corresponding bit in mask to 1, otherwise set it to 0.
2: If a bit is known, set keep the bit in data, otherwise set the corresponding bit in data to 0.
That is ...
Instr: cccc0000010Snnnnddddssss0TT1mmmm
mask : 00001111111000000000000010010000
data : 00000000010000000000000000010000
Using this rule, we can convert the above table using the attached perl-script.
We now get the following (slightly decorated):
static const InstrTab sInstructionTable[] = {
    0x0fe00010, 0x00000000, "and", &and_shift, /* AND{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.14 */
    0x0fe00090, 0x00000010, "and", &and_reg, /* AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15 */
    0x0fe00000, 0x02000000, "and", &and_imm, /* AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13 */
    0x0fe00010, 0x00200000, "eor", &eor_shift, /* EOR{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.47 */
    0x0fe00090, 0x00200010, "eor", &eor_reg, /* EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48 */
    0x0fe00000, 0x02200000, "eor", &eor_imm, /* EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46 */
    0x0fe00010, 0x00400000, "sub", &sub_shift, /* SUB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.223 */
    0x0fe00090, 0x00400010, "sub", &sub_reg, /* SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224 */
    0x0fe00000, 0x02400000, "sub", &sub_imm, /* SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222 */
    0x0fe00010, 0x00600000, "rsb", &rsb_shift, /* RSB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.153 */
    0x00000000, 0x00000000, "", &done, /* end of list */
};
Now, the problem with the table above, is that the order is incorrect. This can be fixed easily:
static const InstrTab sInstructionTable[] = {
    0x0fe00090, 0x00000010, "and", &and_reg, /* AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.15 */
    0x0fe00010, 0x00000000, "and", &and_shift, /* AND{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.14 */
    0x0fe00000, 0x02000000, "and", &and_imm, /* AND{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.13 */
    0x0fe00090, 0x00200010, "eor", &eor_reg, /* EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.48 */
    0x0fe00010, 0x00200000, "eor", &eor_shift, /* EOR{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.47 */
    0x0fe00000, 0x02200000, "eor", &eor_imm, /* EOR{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.46 */
    0x0fe00090, 0x00400010, "sub", &sub_reg, /* SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs> A1 A8.8.224 */
    0x0fe00010, 0x00400000, "sub", &sub_shift, /* SUB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.223 */
    0x0fe00000, 0x02400000, "sub", &sub_imm, /* SUB{S}<c> <Rd>,<Rn>,#<const> A1 A8.8.222 */
    0x0fe00010, 0x00600000, "rsb", &rsb_shift, /* RSB{S}<c> <Rd>,<Rn>,<Rm>{,<shift>} A1 A8.8.153 */
    0x00000000, 0x00000000, "", &done, /* end of list */
};
(First two AND were swapped, first two EOR were swapped and first two SUB were swapped).
-Of course, the Perl-script could be enhanced to do this automatically, generating extra masks and data in those cases where that is needed.
So the following should work:
    InstrTab *tab;     uint32_t instr;     instr = *pc;     tab = &sInstructionTable;     while((instr & tab->mask) != tab->data)     {         tab++;     }     tab->handle(instr,tab);
If writing the code in assembly language, I believe it would be a good idea to re-arrange the contents of the table, so that the instruction name comes first, then the mask, the data and finally the handler.
find_instr:
    ldmia   r4!,{r1-r3,r12}
    and     r2,r2,r0
    cmp     r3,r2
    bne     find_instr
    /* here r0 contains the instruction opcode, r1 contains the name and r12 contains the address of the handler */
    pop     { ... }          /* restore any saved registers */
    bx      r12              /* jump directly to the handler */
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

Hmm, I guess I've been working too hard with the project - I didn't come to think adding the handler in the table.
(I don't like to use function pointers if not necessary - the execution path gets fuzzy and some debuggers can't handle them, but using an enum...) I had a different idea about what the table method meas, but I don't remember what I thought any more. What I remember is that I didn't think of comparing an instruction to all instructions of the instruction set.
If applied to the whole instruction set, it may be a bit heavy for single stepping - on the average it means going through half of the table for each instruction each time.
Then again writing the logic in C code takes a lot of time, and the code in not something you'd show to small children...
(I made the pseudocode whith which I'm still not completely happy, but the C-coding of it is not started yet.)
Maybe I should use the table approach now after all, and go for the C code logic later (if I still feel like it).
I really have to rethink this.
Especially now that I have the instruction set in a spreadsheet from which the instruction bit patterns are easily edited (bits into bytes, byte order, ...) and they can be printed out as a text file for simple editing (if any needed) into a table initializer data form. Also the mask and data can be easily generated with spreadsheet formuli.
The table is also quite compact form - the instruction set itself probably fits in 1 kB, and the actual handlers need to be written anyway.
Too bad there is only 'like'-button. There should also be 'Halleluyah'-button.
Cancel
Up 0 Down

Reply

Accept answer

Cancel
0 Jens Bauer over 9 years ago in reply to Juha Aaltonen

When I wrote my 68xxx debugger, the table was fine (though these were only 16-bit words).
-But ARM's instruction set is not too complicated either. I have not had a look at Cortex-A yet, but the time will come.
In case the table gets too large, you have an extra approach: To split the words into two 16-bit words, so you find the "main part", which leads to a "sub-tree".
Remember: The execution unit in the processor does the job very, very quickly, so I am convinced that ARM designed the instruction set, so it should be easy to dispatch (even by using code).
Yes, the hard part is to find out how.
The good thing about using pointers, is that you can make your lookup-routine in assembly language, and it can jump directly to your C routine.
You can then call it as a C-function, because it uses "goto-style", thus it'll be completely transparent and your C-code will behave like a normal subroutine-call; just very quickly.
The above look-up example can be unrolled easily; this will save a few clock cycles on each iteration:
find_instr:
    ldmia   r4!,{r1-r3,r12}
    and     r2,r2,r0
    cmp     r3,r2
    .rept   7
    ldmiane r4!,{r1-r3,r12}
    andne   r2,r2,r0
    cmpne   r3,r2
    .endr
    bne     find_instr
    /* here r0 contains the instruction opcode, r1 contains the name and r12 contains the address of the handler */
    pop     { ... }          /* restore any saved registers */
    bx      r12              /* jump directly to the handler */

-Change the '.rept' count as you like... make it 15 or 31, adjust it to suit your needs. Perhaps a large number may start to cause longer execution time, but it's a question of balance.
If you're lucky, you can place instructions that are used often in the beginning of the table (I did that with my debugger, and it started to become quite quick at disassembling).
This kind of code is something I really like. The table-lookup, masks and AND stuff - it brings out good memories too.
-But of course, sometimes it might be easier or shorter or quicker to write a switch-statement and use enumerations for handling each instruction type.
Some instruction types could be handled by the same handler; eg. AND/ORR/EOR and ADD/SUB.
In many cases, it's useful to think of instructions as being in "instruction groups". Eg. LDR/STR is a good example, AND/ORR/EOR, ASL/ASR/LSL/LSR/ROR, etc.
Cancel
Up 0 Down

Reply

Accept answer

Cancel

0 Juha Aaltonen over 9 years ago in reply to Jens Bauer

If you're lucky, you can place instructions that are used often in the beginning of the table

You read my mind.

But ARM's instruction set is not too complicated either.

Assembly is not, but the encoding is.

(0)

MUL{S}<c> <Rd>,<Rn>,<Rm>

A8.8.114

AND{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.15

AND{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.14

MLA{S}<c> <Rd>,<Rn>,<Rm>,<Ra>

A8.8.100

EOR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.48

EOR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}

A8.8.47

UMAAL<c> <RdLo>,<RdHi>,<Rn>,<Rm>

A8.8.255

SUB{S}<c> <Rd>,SP,<Rm>{,<sift>}

A8.8.226

SUB{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>

A8.8.224

and:

c	c	c	c	1	1	0	S	n	n	n	n	d	d	d	d	s	s	s	s	0	T	T	1	m	m	m	m	ORR{S}<c> <Rd>,<Rn>,<Rm>,<type>,<Rs>	A1	A8.8.124
c	c	c	c	1	1	0	S	n	n	n	n	d	d	d	d	x	x	x	x	x	T	T	0	m	m	m	m	ORR{S}<c> <Rd>,<Rn>,<Rm>{,<sift>}	A1	A8.8.123
c	c	c	c	1	1	1	0	n	n	n	n	d	d	d	d	(1)	(1)	(1)	(1)	1	0	0	1	t	t	t	t	STREXD<c> <Rd>,<Rt>,<Rt2>,[<Rn>]	A1	A8.8.214
c	c	c	c	1	1	1	1	n	n	n	n	t	t	t	t	(1)	(1)	(1)	(1)	1	0	0	1	(1)	(1)	(1)	(1)	LDREXD<c> <Rt>,<Rt2>,[<Rn>]	A1	A8.8.77
c	c	c	c	1	1	1	S	(0)	(0)	(0)	(0)	d	d	d	d	0	0	0	0	0	0	0	0	m	m	m	m	MOV{S}<c> <Rd>,<Rm>	A1	A8.8.104
c	c	c	c	1	1	1	S	(0)	(0)	(0)	(0)	d	d	d	d	0	0	0	0	0	1	1	0	m	m	m	m	RRX{S}<c> <Rd>,<Rm>	A1	A8.8.151
c	c	c	c	1	1	1	S	(0)	(0)	(0)	(0)	d	d	d	d	m	m	m	m	0	0	0	1	n	n	n	n	LSL{S}<c> <Rd>,<Rn>,<Rm>	A1	A8.8.95
c	c	c	c	1	1	1	S	(0)	(0)	(0)	(0)	d	d	d	d	m	m	m	m	0	0	1	1	n	n	n	n	LSR{S}<c> <Rd>,<Rn>,<Rm>	A1	A8.8.97
c	c	c	c	1	1	1	S	(0)	(0)	(0)	(0)	d	d	d	d	m	m	m	m	0	1	0	1	n	n	n	n	ASR{S}<c> <Rd>,<Rn>,<Rm>	A1	A8.8.17

Oh, and for small assembly routines I've been using inline asm, like

void rpi2_trap_handler()

{

// IRQs need to be enabled for serial I/O

asm volatile (

"push {r0}\n\t"

"mrs r0, cpsr\n\t"

"bic r0, #128 @ enable irqs\n\t"

"msr cpsr, r0\n\t"

"pop {r0}\n\t"

);

gdb_trap_handler();

}