This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizing gcc to use lr / r14 in leaf functions...?

Note: This was originally posted on 5th November 2009 at http://forums.arm.com

Hi everyone,

I'm having difficulty convincing gcc to use r14 when cross-compiling C code for my most critical leaf function - arguably the oldest ARM hand-coding trick in the book (yes, I had an Archimedes), yet it's just not happening here. :-(

Is there some kind of secret gcc option-fu trickery that enables this? Or are there any special conditions which make gcc think that non-function-calling functions should not be treated as leaf functions?

Any suggestions or comments much appreciated!

Cheers, ....Nick Pelling....

PS: this is what "arm-elf-gcc --v" returns, in case this is a silent side-effect of one of the preconfigured options:-

Configured with: ../gcc-4.4.1/configure --target=arm-elf --prefix=/home/yagarto/install --disable-nls --disable-shared --disable-threads --with-gcc --with-gnu-ld --with-gnu-as --with-dwarf2 --enable-languages=c,c++ --enable-interwork --enable-multilib --with-newlib --with-headers=../newlib-1.17.0/newlib/libc/include --disable-libssp --disable-libstdcxx-pch --disable-libmudflap --disable-libgomp -v

Thread model: single
gcc version 4.4.1 (GCC)


...and this is the basic command line I'm using (note that I've tried both -O2 and -O3 optimization levels without success)...

arm-elf-gcc -mtune=arm926ej-s -Wall -g -O2 -c -o obj\main.o main.c


PPS: might it conceivably be that the arm926ej-s does not allow r14 to be used as a completely general-purpose register?
  • Note: This was originally posted on 6th November 2009 at http://forums.arm.com

    Are you sure it's not just a feature of the code you're giving it? The following [admittedly contrived] code will use LR as one of the loop iterators when compiled with gcc4.2.1, but only with -O3 :

    int foo(int r0) {
    int r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14;

    r1 = 0;

    for(r2=0;r2<r0;r2++)
    for(r3=0;r3<r0;r3++)
    for(r4=0;r4<r0;r4++)
    for(r5=0;r5<r0;r5++)
    for(r6=0;r6<r0;r6++)
    for(r7=0;r7<r0;r7++)
    for(r8=0;r8<r0;r8++)
    for(r9=0;r9<r0;r9++)
    for(r10=0;r10<r0;r10++)
    for(r11=0;r11<r0;r11++)
    for(r12=0;r12<r0;r12++)
    for(r13=0;r13<r0;r13++)
    for(r14=0;r14<r0;r14++)
      r1++;

    return r1;
    }


    hth
    s.
  • Note: This was originally posted on 11th November 2009 at http://forums.arm.com

    If you compile that same code under gcc4.4.1 with -O3 -mtune=arm926ej-s , it doesn't use r14 at all!

    Can I please ask you to compile it under 4.2.1 with -mtune=arm926ej-s? That should show up where the key difference lies.

    Thanks, ....Nick Pelling....

    PS: I hacked in a bit of intrinsic inline assembler to make use of r14 (saving and restoring it outside the inner loop, of course), but was mysteriously unable to get the (very simple) code to work on the target machine. This makes me suspect that some curious r14-related behaviour may be at play on the arm926ej-s...
  • Note: This was originally posted on 12th November 2009 at http://forums.arm.com

    Nick,

    I've compared 4.2.1 and 4.3.2; 4.2.1 does use LR as an iterator in the example code I provided (even with -mtune=arm926ej-s); 4.3.2 always appears to be using stack space rather than LR.

    hth
    s.
  • Note: This was originally posted on 12th November 2009 at http://forums.arm.com

    Thanks for that, much appreciated!

    I'll go hunting for whatever gcc patch between 4.2.1 and 4.3.2 caused r14 to fall out of favour...