Hi all,
If the processor tries to go back from the test() function to the main() function, the program counter gets always the wrong address - so that an undef handler occurs.... (processor is searching for an programm code in the sdram...). The whole programm is stored in an external nor flash device (0x10000000).
void test(void) { //do some stuff... return; } void main(void) { test(); }
The user stack size is 0x800. That's a short part of the disassembly where the jump to the sdram occurs (the next step after 0x100012D4). According to the map file, the function test() is using the memory space until 0x1000012D8 and the next function starts at 0x100001304. I'm not sure about the space within these two points? All these values are also stored in the nor flash device...
522: return; 0x100012C8 E7980185 LDR R0,[R8,R5,LSL #3] 0x100012CC E3C00001 BIC R0,R0,#0x00000001 0x100012D0 E7880185 STR R0,[R8,R5,LSL #3] 523: } 0x100012D4 E8BD8FF8 LDMIA R13!,{R3-R11,PC} 0x100012D8 544F5250 DD 0x544F5250 0x100012DC 0050495F DD 0x0050495F 0x100012E0 0A50490A DD 0x0A50490A 0x100012E4 00000000 DD 0x00000000 0x100012E8 606E6F64 DD 0x606E6F64 0x100012EC 6E6B2074 DD 0x6E6B2074 0x100012F0 7420776F DD 0x7420776F 0x100012F4 20736968 DD 0x20736968 0x100012F8 6B636170 DD 0x6B636170 0x100012FC 000A7465 DD 0x000A7465 0x10001300 68746520 DD 0x68746520 0x10001304 00000020 DD 0x00000020 39: pbuf->hdr->flag = used;
best regards Howard
I think there is something wrong that the processor doesn't add the new stack variables at the end of the actual stack location.
It could be stack pointer corruption then. If at some point in the program the stack pointer register is not updated properly (or updated improperly) then all subsequent stack operations would reference wrong memory locations. I don't think it's possible to corrupt the stack pointer by means of the C language only. It would involve pieces written in assembly, or improper handling of CPU mode switching, improper handling of interrupts etc...
But it sounds like you almost found the exact place in the program where stack corruption takes place.
It is only a example - nearly every command which uses some space (stack) will occur that other variables will be overwriten.
I think there is something wrong that the processor doesn't add the new stack variables at the end of the actual stack location. For example I've a simple unsigned int i variable in the main() function, which is 0x00, but when the processor do some stuff in the testfunction2() the value is 0x20000041....
It is also a little bit strange, that I couldn't see any value in the memory window (start of the stack location) which is similar to the values shown in the stack window. I only see that the area from 0x314000 to 0x313F00 will change its values)
...another function (testfunction2) where a memcmp instruction is executed - all previous values will be overwriten
I'm not sure why you are mentioning memcmp - this function compares two memory regions, it shouldn't alter their contents. But it sounds like you almost found the exact place in the program where stack corruption takes place. Try locating the exact line of the source code which does it.
ok, now I was able to do some tests. The printf() function uses a lot of the stack size but that's not the problem.
I open the Call Stack Window, where the main() function is shown at the beginning. When the processor calls the testfunction() function and after that another function (testfunction2) where a memcmp instruction is executed - all previous values will be overwriten (e.g. in the main function)....
Call Stack Window: testfunction2() testfunction() main()
ARM_LIB_HEAP 0x312000 EMPTY 0x1000 { } ARM_LIB_STACK 0x314000 EMPTY -0x1000 { }
So the stack size is huge enough, but the programm is only using 0x200 of the whole stack (start at 0x314000). If the programm needs more space for the stack, the programm will start at the beginning of the stack 0x314000 and overwrite all values (located for example in the first function main()).
That means, when the programm tries to go back from the testfunction() to the main() function - the correct address for this jump is no longer located in the stack.
I hope you could give me some advice to solve this problem.
That's probably it. I would expect printf() to easily consume 1K bytes of stack (I don't have data to back this up, though.)
That's a lot.... Do you know a better way to transmit messages to the debug (rs232) interface containing the values of variables?
I also have installed two large buffers with 256 Bytes - one for a small receive ethernet frame and the other one to transmit the next ethernet frame....
Provided that, these huge buffers and the printf() function will overflow the stack - which possibilities do I have to avoid this? If I call another function, I always deliver only an pointer to the buffer - not the buffer. My stack size is 0x1000 = 4kByte - which is also a lot....
maybe the printf() function is using a huge amount of the stack...
That's probably it. I would expect printf() to easily consume 1K bytes of stack (I don't have data to back this up, though.) As for required stack size for a typical program, it really strongly depends on the program. Deeply nested function calls and lots of large automatic variables can consume tons of stack space.
I think that's what the Call Stack Unwinder is for. You could also analyze the disassembly and derive expected stack usage from that.
Could you show me a small example? The Call Stack Unwinder only displays the c-functions with their variables (and their values... there's no error) - but I can't find the relation between the Stack Unwinder window and the memory window displaying the top of the stack.
One way is to let the program run for a while (stress-test it,) then stop it and inspect stack space
The test() function will be called if a new packet was received by ethernet. If this happens, the error occurs.... If I reduce the code within the test() function everything seems to work. Therefore I think there would be something wrong with the stack - but the stack size seems to be big enough. I'm not sure what a common size of the stack would be... maybe the printf() function is using a huge amount of the stack...
Do you know if the printf() function use the stack or the heap section?
I don't know. I suspect that printf() doesn't use heap, because there are applications where you'd use printf() and would want to avoid using heap, and the standard library should support that.
But how could I inspect the stack with Keil?
It seems that the area isn't correct initialized...
This seems normal. You would expect the used part of the stack to contain non-zeros, and the yet untuched part to be all-zeros.
Moreover do you know if it is possible to see if the size of the different stacks (irq, user, supervisor stack...) are big enough?
One way is to let the program run for a while (stress-test it,) then stop it and inspect stack space. The untouched part of the stack is all-zeros. This should give you an idea of how much stack was used.
Hi Mike, thank you very much for your answer!
If you write the contents of this space in ASCII, you'll see: _IP IP don't know this packet eth
Yes, these are the terms of the printf() functions using in this c-function test(). Very interesting, that this part is here located. Do you know if the printf() function use the stack or the heap section?
I'm not sure if this is a question. You probably want to know what could cause this. I would start debugging this by stopping the program at the return statement and inspecting the stack.
Ok, that makes sense. But how could I inspect the stack with Keil? I just took a look at the sram position, where the stack is located. It seems that the area isn't correct initialized because the above area of the stack is filled with 0x00 and the lower part is full with code (variables).
scatter-loading file:
map-file:
Image$$Relocate_region$$ZI$$Limit 0x003036e8 Number 0 anon$$obj.o(linker$$defined$$symbols) Image$$ARM_LIB_HEAP$$Base 0x00312000 Number 0 anon$$obj.o(linker$$defined$$symbols) Image$$ARM_LIB_HEAP$$ZI$$Limit 0x00313000 Number 0 anon$$obj.o(linker$$defined$$symbols) Image$$ARM_LIB_STACK$$Base 0x00313000 Number 0 anon$$obj.o(linker$$defined$$symbols) Image$$ARM_LIB_STACK$$ZI$$Limit 0x00314000 Number 0 anon$$obj.o(linker$$defined$$symbols)
the program counter gets always the wrong address - so that an undef handler occurs
I'm not sure if this is a question. You probably want to know what could cause this. It could well be that the part where it says '//do some stuff' corrupts the stack. I would start debugging this by stopping the program at the return statement and inspecting the stack. Then I would try to find the exact point at which the stack gets corrupted. The good news is that it happens always: the bug is easily reproduced, so hopefully it is also easily fixed.
I'm not sure about the space within these two points?
If you write the contents of this space in ASCII, you'll see: _IP IP don't know this packet eth Apparently, these are literal constants used in your program.
View all questions in Keil forum