Resetting the stack pointer in "noreturn" functions?

Architecture: Cortex-M0

Toolchain: gcc-arm-none-eabi-6-2017-q2-update, gcc-arm-none-eabi-8-2018-q4-major

In an attempt to mitigate the possibility of stack overflow I would like to reset the stack pointer after entering a function that will never return. There are two cases in my code where this occurs, main() and a shutdown() ISR that saves data to flash and enters deep sleep. I use LTO to make the code fit, so main() ends up being quite a large function that requires allocating part of the stack for local variables. My first attempt was to use the "noreturn" attribute combined with a call to  __builtin_unreachable(), but that does not change the generated assembly in any way. I then created an inline assembly function to reset the stack pointer to the last SRAM address:

inline __attribute__((always_inline)) void NO_RETURN (void)
{
        extern const uint32_t __stack_top__;
        asm volatile ("ldr r3, %[stack_top]\n"
                      "mov sp, r3\n"
                      : /* no outputs */
                      : [stack_top] "m" (__stack_top__)
                      : /* no clobbers */
        );
}

I then call this at the very beginning of main and the shutdown ISR:

int main (void)
{
    NO_RETURN();

    /* rest of the code here... */
}

void shutdown_immediate (void)
{
    NO_RETURN();
}

This generates seemingly correct code for the ISR:

00007f60 <shutdown_immediate>:
    7f60:	b570      	push	{r4, r5, r6, lr}
    7f62:	4b21      	ldr	r3, [pc, #132]	; (7fe8 <shutdown_immediate+0x88>)
    7f64:	681b      	ldr	r3, [r3, #0]  ; Why is this instruction inserted by the compiler?
    7f66:	469d      	mov	sp, r3
; ...
    7fe8:	00202000 	eoreq	r2, r0, r0 ; last SRAM address

For main however the "mov sp, r3" happens after stack is allocated for local variables etc.. This will fail once main starts branching.

00001180 <main>:
    1180:	b5f0      	push	{r4, r5, r6, r7, lr}
    1182:	4be7      	ldr	r3, [pc, #924]	; (1520 <main+0x3a0>)
    1184:	b097      	sub	sp, #92	; 0x5c ; This SUB must be _after_ 0x1188!
    1186:	681b      	ldr	r3, [r3, #0]
    1188:	469d      	mov	sp, r3
; ...
    1520:	00202000 	eoreq	r2, r0, r0 ; Last SRAM address

 Does anyone have any tricks for how this can be done correctly? I could always create a second variant of the NO_RETURN() function which takes a stack allocation value as an argument, compile, disassemble, compile again and insert the required "sub sp, #nn" after the "mov sp, r3", but that is a messy solution.

Bonus question: Why does the compiler generate the "ldr r3, [r3, #0]" instruction? "Load r3 into r3 with zero offset" sounds like a nop?