Why does the compiler bother with the __ARM_common_call_via_r0 (_r1, _r2, etc.) functions? They end up being a Thumb instruction of "BX Rn", which is shorter than the call to get to this instruction. If I switch to ARM mode exclusively, these aren't used. How can I eliminate them from my Thumb code?