This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimizer removes ssub16 used to set GE flags

Note: This was originally posted on 27th April 2009 at http://forums.arm.com

I'm using RVCT 3.0 compiler with the optimizer (e.g., -O3). My code has inline assembly.

Here's my problem: If the inline assembly uses a parallel subtract instruction (e.g., SSUB16) to set the GE flags for use by the SEL instruction, the optimizer removes the SSUB16.

It seems that the optimizer removes the SSUB16 because it doesn't see the register result being used, even though the GE results are indeed used.

Here's an example:
__inline int MAX16(int a1_a0, int b1_b0)
{
   register int maxVal16;
   __asm
   {
      ssub16  maxVal16, a1_a0, b1_b0
      sel     maxVal16, a1_a0, b1_b0
   };
   return(maxVal16);
}
int findMax(int a, int b )
{
   return (MAX16(a, b ));
}


The dump file has
    findMax
    $a
    .text
        0x00000000:    e6800fb1    ....    SEL      r0,r0,r1
        0x00000004:    e12fff1e    ../.    BX       lr



Is there a way to prevent this problem without adding cycles?   I can fix the problem by using a volatile ptr for the ssub16 result, but that adds extra cycles.

Thanks in advance for any help.
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    Sim,

    Thanks for the idea of using pragma push, pragma O0, pragma pop.

    Unfortunately, although the pragma's were a great idea for the simplified example I gave, it doesn't address my real situation.

    In reality, I have a routine with many inline asm calls and mixed with C code.

    If I put the pragma's surrounding the main routine per your example, I lose C-level optimization across the routine.

    If I put the pragma's within the inline asm routines themselves, or surrounding the inline asm calls in the C routine, those inline asm routines mysteriously disappear from the dump file output, as if they were optimized out.  That is, it doesn't seem like you can mix optimization levels within a routine--which seems understandable.

    So I'm still faced with the original problem.

    Thanks again.
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    I can reproduce the issue on the latest build of RVCT 4.0, so it doesn't look like a patch is available.

    You might want to raise this formally with ARM support (support@arm.com) if you have a support contract for the tools.

    Iso
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    #pragma O0


    Nice - I never knew you could do that =) Learn something new every day...
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    It sounds like you could break up your single C function in to smaller pieces and only use the pragma O0 for the problematic part(s).

    The extreme solution is to simply dump the assembler in to a separate file using fromelf, fix the optimization bug by hand, and put the result through armasm.
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    Interesting! Looks like a compiler bug to me. Are you able to try a newer version of RVCT?

    You could work around the problem by using an embedded assembler function, rather than inline assembler, but I don't think you can inline them. For example:
    __asm int MAX16(int a1_a0, int b1_b0)
    {
    mov  r2, r0
    ssub16  r0, r2, r1
    sel  r0, r2, r1
    bx lr
    }


    Also, GCC allows you to use the volatile keyword to indicate that an asm block should not be optimized. I'm not sure if RVCT provides that or not, but if it does it would solve your problem. Try doing "__asm volatile" in place of just "__asm" in your code.

    Thanks,
    Jacob
  • Note: This was originally posted on 28th April 2009 at http://forums.arm.com

    nashau,

    The following pragma usage appears to generate the desired code:

    __inline int MAX16(int a1_a0, int b1_b0)
    {
      int maxVal16;
      __asm  {
    ssub16  maxVal16, a1_a0, b1_b0
    sel  maxVal16, a1_a0, b1_b0
      };
      return(maxVal16);
    }

    #pragma push
    #pragma O0
    int findMax(int a, int b )
    {
      return (MAX16(a, b));
    }
    #pragma pop


    hth
    s.