This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM Compiler creates a bug in code - how to fix?

Note: This was originally posted on 17th September 2009 at http://forums.arm.com

I wouldn't have thought ARMCC could create an explicit bug, but it seems here it is.

In this case, what it's doing is placing an item on the stack in a position which will certainly corrupt the stack.

Looking through the disassembly for the offending function, I can see that the first line (as expected) is

PUSH     {r2-r4,r6-r11,lr}

But then later in the code, and with NO other stack operations in the meanwhile (nor any function calls or other branches out of the current function context) it has

ADD      r2,r0,r4,LSL #2
LDR      r0,|L1.3728|
STR      r2,[sp,#4]

As I understand it, since the stack is full-descending, this means that the value of r2 currently in the stack will get corrupted, because the STR instruction will store the (changed) current r2 value in the position occupied by the original r2. The function in concern is declared

void receivePacket(void)

so it's not expecting to return values in r2 or take arguments in r2.

Why would armcc do something so glaringly illegal? Is there something I can do to fix this?

As an aside, I'll mention that this bad stacking is actually occuring in the setup for a subroutine call, that happens a few lines lower down. The routine in concern uses r1, r2 and r3 for its arguments and returns in r0 and r1. As it happens, the value that it's stacking from r2 is actually the address that will take the return value in r0 - really, what the compiler *should* do, I think, is place the address in some register rx (x > 3) then do an STR r0, [rx, #0] on return.

AlexRast over 12 years ago

Note: This was originally posted on 18th September 2009 at http://forums.arm.com

Given the information you have provided, the compiler doesn't appear to have done anything illegal.

s.

is this because of this comment?

"Registers r0 to r3 and r12 may be corrupted between in function calls, so the compiler is fine to do this. The function must, however, ensure that r4 to r11 and sp are preserved. See page 15 of the ARM Procedure Call Standard.

But if that's the case, why is the compiler stacking registers r2 and r3 in the first place? If these can be corrupted arbitrarily then it is a complete waste of time to stack them under essentially any situation.

I have problems with the way the ARM procedure call standard is written, because it seems in a lot of cases to be very ambiguous in its choice of language. For instance, for registers what they state is this:

"The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result value from a function. They may also be used to hold intermediate values within a routine (but, in general, only between subroutine calls).
Register r12 (IP) may be used by a linker as a scratch register between a routine and any subroutine it calls (for details, see §5.3.1.1, Use of IP by the linker). It can also be used within a routine to hold intermediate values between subroutine calls.
The role of register r9 is platform specific. A virtual platform may assign any role to this register and must document this usage. For example, it may designate it as the static base (SB) in a position-independent data model, or it may designate it as the thread register (TR) in an environment with thread-local storage. The usage of this register may require that the value held is persistent across all calls. A virtual platform that has no need for such a special register may designate r9 as an additional callee-saved variable register, v6.
Typically, the registers r4-r8, r10 and r11 (v1-v5, v7 and v8) are used to hold the values of a routine's local variables. Of these, only v1-v4 can be used uniformly by the whole Thumb instruction set, but the AAPCS does not require that Thumb code only use those registers.
A subroutine must preserve the contents of the registers r4-r8, r10, r11 and SP (and r9 in PCS variants that designate r9 as v6)."

There are several parts of this that I find confusing. "...hold intermediate values within a routine (but in general only between subroutine calls)". As that statement is worded what I would think that would mean is: registers r0-r3 may be used in a routine, such that that routine calls subroutines, so that the values held within those registers map directly into variables in the lower-level subroutine. In other words, if you have int foo(int x, int y, int z) and int bar(int a, int b), if the intention is that values x and y from foo are to be used directly in bar as a and b, it's OK for the ATPCS to keep the values in r0 and r1, then simply branch into bar with those values intact. The value bar returned in r0 could then form the return value from foo as well, simply by keeping r0 in foo until it returns. This would seem to me at least reasonably sensible - i.e. no need to shuffle registers around if you're going to pass them right into a subroutine anyway. That, however, is clearly not the sense in which ARM actually uses these registers and I wish they would use more explicit language to describe exactly what is, and is not, allowed.

If the statement above, "Registers r0 to r3 and r12 may be corrupted..." is correct, would it be accurate then to describe the situation as:

Registers r0-r3 and r12 may be used within a routine only to the extent that the programmer can guarantee no branch or exception from within the routine can result in a context switch that would require these registers to have the same value upon return from the branch that they had before the branch.

If this statement is correct then I'm happy to accept that the compiler is doing nothing likely to break higher-level functions, although that still begs the question of why in that case it would bother to stack r2 and r3 anyway - especially r2 since it could know that the stacked value would be corrupted later. Meanwhile, if this summarises the position, why doesn't the ATPCS/AAPCS use a statement with similar such language that makes it explicitly clear what it's going to do in local situations?
Cancel
Vote up 0 Vote down

Cancel
AlexRast over 12 years ago

Note: This was originally posted on 21st September 2009 at http://forums.arm.com

Yep, that's entirely sensible, and precisely what APCS lets you do.

What is means by "only in between subroutine calls" is that as soon as you call a subroutine that subroutine is free to clobber r0-r3 and r12 (in accordance with the APCS).

That's what I was expecting - since we know that r0-r3 are used for function parameters it would be natural to assume a function (subroutine) passed those parameters is (at least potentially) going to modify them - why else would you have given them to the function as parameters?

As written the spec leaves a bit of an ambiguity that I wasn't sure about before trying some things - namely if you have a function that doesn't have as many parameters as the number of registers set aside for possible parameter use, e.g. int foo(int x, int y), whether the compiler then interprets the "missing" registers as ones that need to be preserved, and whether the compiler allocates argument registers completely sequentially. For instance it could assign foo's arguments as: return in r0, x = r1, y = r2. Meanwhile what would it do with r3? I was able to discover right away that it will actually assign arguments as: return in r0, x = r0, y = r2. Now I think you've clarified that it will NOT necessarily stack r3 - thus any higher-level function must not have active values in r3 that must not be corrupted by foo.

If I've got this understood correctly this also means that if you want a function that returns a modified value of its arguments: e.g. int foo(int x, int y) {x = x + y; return x} then you can make the compiler's job easier by adept ordering of your parameter list.

So if the caller has stored any intermediate value in any of these clobberable registers, and it wants to use the value after the subroutine it either has to move the value in to r4-r11 or r13-14 or write it to the stack. So you cannot store intermediates "across subroutine calls" in the parameter passing or IP register ... otherwise you risk loosing the value.

My feeling is that this should have been more clearly spelled out in the spec. Actually, the words "between subroutine calls" and "across subroutine calls" are themselves ambiguous, I think, and either they should use more explicit language or define at the beginning of the spec exactly what these terms mean in their case.

In any case, thanks for the good explanation.

Most modern compilers operate on a "move the stack pointer once per function" (ignoring stack push and pop wrapping subroutines), so will essentially reserve enough space on the stack to ensure that any intermediate space on the stack for storing scratch values is allocated on function entry and freed on function exit. In your case, as you pointed out in your first post, the compiler stores r2 back to the stack as a scratch variable; the initial "push" of r2 and r3 is just to reserve the space without needing a second instruction to increment SP.

Interesting and a bit surprising. What's the theory behind doing that? My feeling would be that for efficiency's sake there's no sense messing about with the stack unless you're actually achieving something *necessary*. In ARM's case in particular, since the instruction set provides address preincrement/postincrement with any memory function anyway, placing an additional stack push wastes a machine cycle that you didn't need to do - because the ARM must do the intial pop at the beginning of the function and then must still actually store the value it wants to at the time it stacks the value.

If, on the other hand, the idea of this is to treat the stack as some sort of extension of the register space, so that each extended word corresponds to a fixed "stack" position, it isn't a stack anymore - at which point why not just read and write from heap or any other free memory area?

The APCS also requires the stack to be 8 byte aligned on function call, so r3 is probably pushed just to ensure an even number of registers are pushed, although your function may use that space for scratch stack too...

Ah, right, that's a possibility.

Thanks once again for the enlightening explanations.
Cancel
Vote up 0 Vote down

Cancel
china over 12 years ago

Note: This was originally posted on 22nd September 2009 at http://forums.arm.com

l am a vistor
Cancel
Vote up 0 Vote down

Cancel
Peter Harris over 12 years ago

Note: This was originally posted on 18th September 2009 at http://forums.arm.com

This would seem to me at least reasonably sensible - i.e. no need to shuffle registers around if you're going to pass them right into a subroutine anyway. That, however, is clearly not the sense in which ARM actually uses these registers and I wish they would use more explicit language to describe exactly what is, and is not, allowed.

Yep, that's entirely sensible, and precisely what APCS lets you do.

What is means by "only in between subroutine calls" is that as soon as you call a subroutine that subroutine is free to clobber r0-r3 and r12 (in accordance with the APCS). So if the caller has stored any intermediate value in any of these clobberable registers, and it wants to use the value after the subroutine it either has to move the value in to r4-r11 or r13-14 or write it to the stack. So you cannot store intermediates "across subroutine calls" in the parameter passing or IP register ... otherwise you risk loosing the value.

But if that's the case, why is the compiler stacking registers r2 and r3 in the first place? If these can be corrupted arbitrarily then it is a complete waste of time to stack them under essentially any situation.

Most modern compilers operate on a "move the stack pointer once per function" (ignoring stack push and pop wrapping subroutines), so will essentially reserve enough space on the stack to ensure that any intermediate space on the stack for storing scratch values is allocated on function entry and freed on function exit. In your case, as you pointed out in your first post, the compiler stores r2 back to the stack as a scratch variable; the initial "push" of r2 and r3 is just to reserve the space without needing a second instruction to increment SP.

The APCS also requires the stack to be 8 byte aligned on function call, so r3 is probably pushed just to ensure an even number of registers are pushed, although your function may use that space for scratch stack too...
Cancel
Vote up 0 Vote down

Cancel
Peter Harris over 12 years ago

Note: This was originally posted on 22nd September 2009 at http://forums.arm.com

Now I think you've clarified that it will NOT necessarily stack r3?

Correct. In this case foo() knows that only r0 and r1 contain valid values (it's parameters), and that it is free to clobber (or let subroutines clobber) r2, r3 and r12. So it can directly pass through to bar() without doing any stack operations.

What's the theory behind doing that?

The theory is really just implementation simplicity - if you have to spill registers to the stack you have to be able to use SP relative addressing to retrieve the value when you need it later in the function. If SP moves, tracking all possible offsets is a pretty complex task, and in reality doesn't buy you very much performance - so keep it simple stupid wins =).

I think it also has some impact on exception handling in C++, but that's outside of my area of knowledge...

If, on the other hand, the idea of this is to treat the stack as some sort of extension of the register space, so that each extended word corresponds to a fixed "stack" position, it isn't a stack anymore - at which point why not just read and write from heap or any other free memory area?

Using stack means that your "register file" is effectively infinite, as you can extend it on demand for any function in the call stack. If you had a fixed number of "extended virtual registers" at a fixed location in memory you get the same problem you have with physical registers - they are shared across function calls, and if foo is using virtual register 64, then bar has to put is somewhere if it wants to use it - so it would still have to get pushed to the stack...
Cancel
Vote up 0 Vote down

Cancel
Simon Craske over 12 years ago

Note: This was originally posted on 18th September 2009 at http://forums.arm.com

Paraphrasing Isogen,

Your function is using a shed load of registers, plus two words of stack;
the PUSH provides the space for the two words of stack space without
the need for an addition "SUB SP"; presumably your code is optimised
to reduce code size, rather than execution time.

hth
s.
Cancel
Vote up 0 Vote down

Cancel
Simon Craske over 12 years ago

Note: This was originally posted on 17th September 2009 at http://forums.arm.com

Why would armcc do something so glaringly illegal?

Given the information you have provided, the compiler doesn't appear to have done anything illegal.

s.
Cancel
Vote up 0 Vote down

Cancel
Bhaveet Shah over 12 years ago

Note: This was originally posted on 17th September 2009 at http://forums.arm.com

In this case, what it's doing is placing an item on the stack in a position which will certainly corrupt the stack.

Looking through the disassembly for the offending function, I can see that the first line (as expected) is

PUSH     {r2-r4,r6-r11,lr}

But then later in the code, and with NO other stack operations in the meanwhile (nor any function calls or other branches out of the current function context) it has

ADD      r2,r0,r4,LSL #2
LDR      r0,|L1.3728|
STR      r2,[sp,#4]

As I understand it, since the stack is full-descending, this means that the value of r2 currently in the stack will get corrupted, because the STR instruction will store the (changed) current r2 value in the position occupied by the original r2. The function in concern is declared

void receivePacket(void)

so it's not expecting to return values in r2 or take arguments in r2.

Why would armcc do something so glaringly illegal? Is there something I can do to fix this?

Registers r0 to r3 and r12 may be corrupted between in function calls, so the compiler is fine to do this. The function must, however, ensure that r4 to r11 and sp are preserved. See page 15 of the ARM Procedure Call Standard.
Cancel
Vote up 0 Vote down

Cancel
guestposter guestposter over 12 years ago

Note: This was originally posted on 14th November 2009 at http://forums.arm.com

You made some good points there. I did a search on the topic and found most people will agree with your blog.
Cancel
Vote up 0 Vote down

Cancel
guestposter guestposter over 12 years ago

Note: This was originally posted on 6th October 2009 at http://forums.arm.com

Thank you for bring such a topic. Very informative, I am learning a lot. Thank you.
Cancel
Vote up 0 Vote down

Cancel