I was trying to write a register context saving/restoring when I came across a weird behaviour.
My code (sorry, tried to format tens of times, but the editor WANTS to make asm a table):
When the exception is returned from, the calling function:
msg = "returned from SVC\r\n";
serial_io.put_string(msg, util_str_len(msg)+1);
asm volatile (
);
msg = "cpsr = ";
util_word_to_hex(scratchpad, tmp1);
serial_io.put_string(scratchpad, 9);
serial_io.put_string("\r\n", 3);
outputs "returned from SVC" and "cpsr = 60000013".
Why the "00000002"? the barriers don't seem to have any effect.
And another weirdness - this time the gcc:
The source:
// rpi2_svc_handler2() // - No C in naked function
The disassembly:
1f000cf8: e1a0000d mov r0, sp
1f000cfc: e2001007 and r1, r0, #7
1f000d00: e0400001 sub r0, r0, r1
1f000d04: e1a0d000 mov sp, r0
1f000d08: e92d0003 push {r0, r1}
1f000d0c: e1a0000d mov r0, sp
1f000d10: e1a0100e mov r1, lr
1f000d14: e92d000f push {r0, r1, r2, r3}
1f000d18: ebffff51 bl 1f000a64 <rpi2_svc_handler2>
1f000d1c: e8bd000f pop {r0, r1, r2, r3}
Where's the "pop {r0, r1}?
1f000d20: e0800001 add r0, r0, r1
1f000d24: e1a0d000 mov sp, r0
The stack fix pop in "restore"-part ("pop {r0, r1}\n\t") is missing from the disassembly!
OK, the push and pop around call to rpi2_svc_handler2 are needless, but still - the stack effect...
The compiler shouldn't "optimize" such that the stack gets unbalanced, and the data got is wrong.
But after "msr cpsr_fsxc, r2" I do "msr spsr_fsxc, r3" before reading "mrs r3, spsr".
The setting and reading spsr should happen in the same mode.
Ah, stupid me. That's what you ment - no writeback -> no write back...
"dbg_out is allowed to corrupt r2 and r3 as well"
Yes, allowed, but it doesn't - checked the disassembly.
"But remember to restore SP..."
I wrote them as pair. That way you don't forget, and it's easier to write one as "mirror image" of the other.
Hello turboscrew,
do you wonder why "00000002" was read as SPSR?
If it would be correct, I think "msr cpsr_fsxc, r2" would affect it.
By changing CSPR, the execution mode was changed.
After that, the SPSR would be read from the new execution mode.
Probably it would be unknown value.
I guess you did recover CPSR to SVC mode after reading SPSR.
Therefore, the correct CPSR was read in the main function.
Best regards,
Yasuhiko Koumoto.
Uhm, what I mean is that a single LDM is approximately twice as fast as the two instructions POP + PUSH.
-Thus you will not have to use POP+PUSH, but can just read the contents of the stack and avoid writing to it.
The registers need to be loaded from the stack, that is correct, because they do not contain any defined value on interrupt entry.
Thus instead of ...
pop {r0-r3}
push {r0-r3}
... you can write ...
ldm sp,{r0-r3}
... which only reads the registers.
dbg_out is allowed to corrupt r2 and r3 as well, which I think is why you see the strange value you mentioned earlier.
I did not know about the 8-byte alignment requirement.
-But remember to restore SP to its original value before you return; either by saving the entire value of SP or by adding the difference back, otherwise you'll get a crash.
It's hard to believe that LDM saves clock cycles compared to POP - at least on Cortex-A7:
c c c c 1 0 0 0 1 0 W 1 n n n n r r r r r r r r r r r r r r r r arm_cldstm_ldm arm_core_ldstm LDM<c> <Rn>{!}, <registers> A1 A8.8.58
c c c c 1 0 0 0 1 0 1 1 1 1 0 1 r r r r r r r r r r r r r r r r arm_cldstm_pop arm_core_ldstm POP<c> <registers>
(The 'W' is writeback, and nnnn is the base register - SP is 1101.)
The pushing and popping is about debug.
I have stuff in stack, and if I suspect the registers may have been corrupted, I pop and push back to "fix" the registers.
And actually, the 'dbg_out' corrupts r0 and r1.
Another weirdness in today's debug:
The first 'gbg_out' prints 1f012658, the second prints 00000000 and the third 1f012658...
The stack should be 'corrected', because when you call assembly from C, the stack can be 4 but not 8 byte aligned, and GCC requires, that when you call C-code from 'outside (like assembly), the stack should be 8 byte aligned.
I think that 'bl dbg_out' is free to change r0-r3, r12 and lr.
Popping and then pushing r0-r3 looks strange to me; probably because a LDM sp,{r0-r3} would do the job just as well, but save a bunch of clock cycles.
I recommend to avoid pushing the values back onto the stack; if you need to change a single register, I think it would be better to just do an indexed store (STR rN,[sp,#offset])
View all questions in Embedded forum