Problem when dynamically loading code (Cortex-M4)

Hello everyone.

I want to implement dynamic loading of functions in RAM for a Cortex-M4.  It partially works but with a small hack which is not ideal.

First I create the binary data for the function by compiling my desired function:

int f1(int a,int b){
	return a+b;

Compilation Flags: -fPIC -msingle-pic-base -mcpu=cortex-m4 -mthumb

From the compiled file I used the scripts from this link to obtain the assembly code for my binary function.

00000000 <__bd19836dd__f1>:
   0:   4408            add     r0, r1
   2:   4770            bx      lr

00000004 <f1>:
   4:   e92d 4200       stmdb   sp!, {r9, lr}
   8:   b403            push    {r0, r1}
   a:   f04f 011c       mov.w   r1, #28
   e:   6809            ldr     r1, [r1, #0]
  10:   4678            mov     r0, pc
  12:   4788            blx     r1
  14:   4681            mov     r9, r0
  16:   bc03            pop     {r0, r1}
  18:   f7ff fff2       bl      0 <__bd19836dd__f1>
  1c:   e8bd 8200       ldmia.w sp!, {r9, pc}

From this data I create my binary function data for  my main application:

static const unsigned char binFunction[] = 
   0x00,0x00,0x08, 0x44, 0x70, 0x47, 0x2D, 0xE9, 0x00, 0x42, 0x03,
	0xB4, 0x4F, 0xF0, 0x1C, 0x01, 0x09, 0x68, 0x78, 0x46, 0x81, 0x46,
    0x03, 0xBC, 0xFF, 0xF7, 0xF2, 0xFF, 0xBD, 0xE8, 0x00, 0x82

The actual code to execute the binary function from RAM is the following:

const size_t CODE_LEN = sizeof(binFunction) / sizeof(binFunction[0]);
void *p = malloc(CODE_LEN);  //Allocate RAM
memcpy(p, binFunction, CODE_LEN); //Copy code to RAM
int (*fPtr)(int,int); //function ptr
int startPositionCode=7; //Ideally this should be done by having a table where symbols as stored from the compiled function
imagePtr = (int (*)(int,int))p+startPositionCode;
int result= fPtr(5,3);

After debugging the code I found out that this only works if I delete the instruction blx r1 from the assembly code manually in my binFunction variable (i.e. removing 0x88, 0x47 from the array)

I noticed I get a HardFault due to r1 having a value of zero which when executes blx makes the program execute line 0x00 instead of continuing to execute the rest of the assembly instructions of the binary function:

14:   4681            mov     r9, r0
16:   bc03            pop     {r0, r1}
18:   f7ff fff2       bl      0 <__bd19836dd__f1>
1c:   e8bd 8200       ldmia.w sp!, {r9, pc}

I am not an expert on assembly nor on dynamic loading but would be grateful for any hints on how I can debug this issue.

Seems that the problem resides on code generated by the script to load the correct area where the data section is located and load it to register R9. (see line 11 from generated assembly code)

More questions in this forum