This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Can RealView compiler generate SMLAL/UMLAL instructions?

Is there any way the RealView compiler can take advantage of the multiply-and-accumulate instructions of the Cortex M3?

I wrote a simple MAC loop and the compiler didn't generate any SMLAL or UMLAL instructions, which was disappointing.

Thanks,
Andrew Queisser
HP

  • Yes, the compiler does generate SMLAL/UMLAL instructions.

    Simple test code (UMLAL):

    unsigned long long mac_test (unsigned long *a, unsigned long *b, int cnt) {
      unsigned long long res = 0;
    
      while (cnt--) {
        res += (unsigned long long)*a++ * (unsigned long long)*b++;
      }
      return (res);
    }
    

    Compiler output:

                      mac_test PROC
    ;;;1      unsigned long long mac_test (unsigned long *a, unsigned long *b, int cnt) {
    000000  b570              PUSH     {r4-r6,lr}
    000002  4603              MOV      r3,r0
    000004  460c              MOV      r4,r1
    000006  2000              MOVS     r0,#0
    000008  4601              MOV      r1,r0
    ;;;2        unsigned long long res = 0;
    ;;;3
    ;;;4        while (cnt--) {
    00000a  e005              B        |L1.24|
                      |L1.12|
    ;;;5          res += (unsigned long long)*a++ * (unsigned long long)*b++;
    00000c  cb20              LDM      r3!,{r5}
    00000e  cc40              LDM      r4!,{r6}
    000010  fba56506          UMULL    r6,r5,r5,r6
    000014  1830              ADDS     r0,r6,r0
    000016  4169              ADCS     r1,r1,r5
                      |L1.24|
    000018  1e52              SUBS     r2,r2,#1              ;4
    00001a  d2f7              BCS      |L1.12|
    ;;;6        }
    ;;;7        return (res);
    ;;;8      }
    00001c  bd70              POP      {r4-r6,pc}
                              ENDP
    

  • When you use a High-Level Language (HLL) - any HLL - you delegate the choice of machine instructions to the compiler.

    If the use of specific machine instructions is important to you, then you should not be using an HLL - any HLL!

    While you might be happen to stumble upon some specific HLL construct that just happens to generate certain machine instructions under certain conditions, you cannot rely upon it to always continue to do so!
    You will always have to re-check the generated code to be sure.

    Therefore, if it really is important, write an assembler module to call from the HLL.

    See: http://www.keil.com/forum/17991/

  • 000010  fba56506          UMULL    r6,r5,r5,r6
    000014  1830              ADDS     r0,r6,r0
    000016  4169              ADCS     r1,r1,r5
    


    Ironically, this compiler output doesn't show UMLAL being used here at all. But with my local versions of armcc (RVCT4.0 [Build 677], RVCT4.0 [Build 821]), UMLAL is generated indeed.

    Regards
    Marcus
    http://www.doulos.com/arm/

  • Hi Andy,

    Totally agree about HLL and specific machine instructions. However, in this case I'm not using the compiler as a HLL but as the frontend to the assembler. That way I don't have to remember the calling conventions and exact syntax of the assembler. Once I've forced the compiler to generate close to what I want, in this case the MAC instructions, I throw away the C-code and tweak the assembly. The ASM file is what goes into source control.

    Andrew

  • Hi Marcus,

    Thanks for the tip - I'm using armcc 4.1 Build 561 from the Keil UV4 installation. What are the command line options you use to see the UMLAL instructions?

    Thanks,
    Andrew

  • You mean other than "--cpu=cortex-m3"? Nothing in case of my Q&D test. I think that by default, "-O2 -Ospace" are selected by the compiler.

    Regarding your statement

    > Once I've forced the compiler to generate close to what I want, in this case the MAC
    > instructions, I throw away the C-code and tweak the assembly.

    May I ask why? The RealView compiler is fairly good at generating very efficient code. Implementing things in C increases the chance that you will remember what you did two weeks from now. I don't find many places where I could have outsmarted the compiler.

    --
    Marcus

  • >> Once I've forced the compiler to generate close to what I want, in this case the MAC
    >> instructions, I throw away the C-code and tweak the assembly.

    > May I ask why? The RealView compiler is fairly good at generating very efficient code.
    > Implementing things in C increases the chance that you will remember what you did two
    > weeks from now. I don't find many places where I could have outsmarted the compiler.

    Exactly. In this particular case, if the compiler uses the MAC instructions I'm happy. Otherwise I might grudgingly resort to assembly but only if our profiling shows that optimizing this particular operation is worthwhile. Since our application is very power sensitive we want to get the RMS calculation done as quickly as possible so we can put the CPU back to sleep.

    Andrew

  • Andrew,

    is it worth the effort? The ARM compiler is so good that you can hardly beat it. My suggestion is to write your application in HLL and put only time-critical functions (cpu intensive) to assembly module and optimize it. You might even discover that there is no need for the assembly at all.

    The project overview, maintenance later, etc. is much better in HLL.

    Franc

  • Hi Franc,

    Reading through my messages in this thread I realize I wan't really clear. The vast majority of our code is in C and we only resort to assembly for small time critical chunks, just as you suggest.

    I agree that assembly is hard to maintain although at this point ARM is so prevalent (the last three projects I was on used it) that learning ARM assembly seems to be a good investment.

    Thanks,
    Andrew