Weird SPSR behaviour

I was trying to write a register context saving/restoring when I came across a weird behaviour.

My code (sorry, tried to format tens of times, but the editor WANTS to make asm a table):

asm volatile (
...
"pop {r0 - r3}"
"push {r0 - r3}"
"mov r0, r3"
"bl dbg_out" - outputs 60000013
"pop {r0 - r3}"
"msr cpsr_fsxc, r2"
"@dsb"
"@isb"
"msr spsr_fsxc, r3" - set value
"@dsb"
"@isb"
"mov lr, r1"
"mov sp, r0"
"push {r0 - r4, lr}"
"mov r0, lr"
"bl dbg_out"
"mov r0, sp"
"bl dbg_out"
"mrs r2, cpsr"
"mrs r3, spsr" - read value
"mov r0, r2"
"bl dbg_out"
"mov r0, r3" - outputs 00000002
"bl dbg_out"
...
);

When the exception is returned from, the calling function:

asm volatile ("svc #0\n\t");

    msg = "returned from SVC\r\n";

    serial_io.put_string(msg, util_str_len(msg)+1);

    asm volatile (

"mrs %[retreg], cpsr\n\t"
:[retreg] "=r" (tmp1) ::

    );

    msg = "cpsr = ";

    serial_io.put_string(msg, util_str_len(msg)+1);

    util_word_to_hex(scratchpad, tmp1);

    serial_io.put_string(scratchpad, 9);

    serial_io.put_string("\r\n", 3);

outputs "returned from SVC" and "cpsr = 60000013".

Why the "00000002"? the barriers don't seem to have any effect.

Parents
  • It's hard to believe that LDM saves clock cycles compared to POP - at least on Cortex-A7:

    c    c    c    c    1    0    0    0    1    0    W    1    n    n    n    n    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    arm_cldstm_ldm    arm_core_ldstm    LDM<c> <Rn>{!}, <registers>    A1    A8.8.58

    c    c    c    c    1    0    0    0    1    0    1     1    1    1    0    1    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    arm_cldstm_pop    arm_core_ldstm    POP<c> <registers>

      (The 'W' is writeback, and nnnn is the base register - SP is 1101.)

    The pushing and popping is about debug.

    I have stuff in stack, and if I suspect the registers may have been corrupted, I pop and push back to "fix" the registers.

    And actually, the 'dbg_out' corrupts r0 and r1.

    Another weirdness in today's debug:

       "@ align SP\n\t"
       "mov r0, sp\n\t"
       "bl dbg_out\n\t"
       "mov r0, sp\n\t"
       "and r1, r0, #7\n\t"
       "push {r0, r1}\n\t"
       "mov r0, r1\n\t"
       "bl dbg_out\n\t"
       "pop {r0, r1}\n\t"
       "sub r0, r1\n\t"
       "mov sp, r0\n\t"
       "push {r0,r1} @ stack correction"
       "mov r0, r1\n\t"
       "bl dbg_out\n\t"

    The first 'gbg_out' prints 1f012658, the second prints 00000000 and the third 1f012658...

    The stack should be 'corrected', because when you call assembly from C, the stack can be 4 but not 8 byte aligned, and GCC requires, that when you call C-code from 'outside (like assembly), the stack should be 8 byte aligned.

Reply
  • It's hard to believe that LDM saves clock cycles compared to POP - at least on Cortex-A7:

    c    c    c    c    1    0    0    0    1    0    W    1    n    n    n    n    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    arm_cldstm_ldm    arm_core_ldstm    LDM<c> <Rn>{!}, <registers>    A1    A8.8.58

    c    c    c    c    1    0    0    0    1    0    1     1    1    1    0    1    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    r    arm_cldstm_pop    arm_core_ldstm    POP<c> <registers>

      (The 'W' is writeback, and nnnn is the base register - SP is 1101.)

    The pushing and popping is about debug.

    I have stuff in stack, and if I suspect the registers may have been corrupted, I pop and push back to "fix" the registers.

    And actually, the 'dbg_out' corrupts r0 and r1.

    Another weirdness in today's debug:

       "@ align SP\n\t"
       "mov r0, sp\n\t"
       "bl dbg_out\n\t"
       "mov r0, sp\n\t"
       "and r1, r0, #7\n\t"
       "push {r0, r1}\n\t"
       "mov r0, r1\n\t"
       "bl dbg_out\n\t"
       "pop {r0, r1}\n\t"
       "sub r0, r1\n\t"
       "mov sp, r0\n\t"
       "push {r0,r1} @ stack correction"
       "mov r0, r1\n\t"
       "bl dbg_out\n\t"

    The first 'gbg_out' prints 1f012658, the second prints 00000000 and the third 1f012658...

    The stack should be 'corrected', because when you call assembly from C, the stack can be 4 but not 8 byte aligned, and GCC requires, that when you call C-code from 'outside (like assembly), the stack should be 8 byte aligned.

Children