We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Yep, that's entirely sensible, and precisely what APCS lets you do. What is means by "only in between subroutine calls" is that as soon as you call a subroutine that subroutine is free to clobber r0-r3 and r12 (in accordance with the APCS).
So if the caller has stored any intermediate value in any of these clobberable registers, and it wants to use the value after the subroutine it either has to move the value in to r4-r11 or r13-14 or write it to the stack. So you cannot store intermediates "across subroutine calls" in the parameter passing or IP register ... otherwise you risk loosing the value.
Most modern compilers operate on a "move the stack pointer once per function" (ignoring stack push and pop wrapping subroutines), so will essentially reserve enough space on the stack to ensure that any intermediate space on the stack for storing scratch values is allocated on function entry and freed on function exit. In your case, as you pointed out in your first post, the compiler stores r2 back to the stack as a scratch variable; the initial "push" of r2 and r3 is just to reserve the space without needing a second instruction to increment SP.
The APCS also requires the stack to be 8 byte aligned on function call, so r3 is probably pushed just to ensure an even number of registers are pushed, although your function may use that space for scratch stack too...