Implementing a loader for a PIE executable on Cortex-M — and questions about PIE support in Arm GNU toolchain

Hello everyone,
I’m working on a project (for learning purposes) where I want to implement a loader for a position-independent executable (PIE) on a Cortex-M microcontroller. My goal is to load the binary at runtime into an arbitrary memory location and let it run as a PIE.

I’m using the Arm GNU toolchain, compiling the modules with -fPIE and linking with -pie. So far, all my builds have worked fine and seem to behave correctly as position-independent code when relocated by my loader. However, I would like to understand whether the PIE support in this toolchain is officially supported on Cortex-M targets, or whether I’ve just been lucky so far. Are there any known limitations or caveats in relying on -fPIE/-pie for embedded (bare-metal) Cortex-M use cases?

About the loader

I’m now trying to focus on what the loader should actually do, and whether my current approach is correct and sufficient.

At runtime, I load the raw binary image, which includes all sections marked with ALLOC (text, data, bss, etc.). I have external information that gives me the offsets, addresses, and sizes of these sections (manually extracted from the ELF, but this could be automated).

I have noticed:

The .rel.dyn section contains relocation entries, all of which have R_ARM_RELATIVE (0x17) as their r_info type. My understanding is that for these relocations, I just need to add the load address offset to the values at the specified offsets.
The .got section contains pointers used by the code. There is also a .got.plt section, but I see no relocation entries referring to it in .rel.dyn. Do I need to manually initialize .got.plt? Or can I ignore it, since I’m not using dynamic linking?
The .dynamic section is present as well. Since I’m not doing dynamic linking (and there’s no dynamic linker on Cortex-M), can this section be ignored? Or does it contain information useful for relocation (e.g., addresses, counts for .rel.dyn entries)?

What I think the loader must do

Here’s the basic sequence of what my loader currently does:

Copy/load the code and data sections to RAM.
Iterate over the .rel.dyn entries and apply the base address relocation (for R_ARM_RELATIVE).
Optionally initialize the stack pointer and jump to the entry point.

Is this complete? Am I missing anything fundamental, especially related to .got.plt or .dynamic sections?

Summary

Is PIE support (-fPIE/-pie) in Arm GNU toolchain (in particulare newlib-nano) for Cortex-M an officially supported feature? Or is its correctness a side effect?
Is the approach I described for the loader sufficient to correctly relocate and run a PIE binary on Cortex-M?
Any guidance or references on how to build robust PIE loaders on bare-metal Arm microcontrollers would be greatly appreciated!