This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Compiler produces inefficient assembly code?

I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.

Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:

uint8 j;
        for(j=10;j--;){
                rightTorqueArray[j] = j; }


The array is an unsigned char array but when we observe the assembly

        MOV     R7,#0AH
?C0001:
        MOV     R6,AR7
        DEC     R7
        MOV     A,R6
        JZ      ?C0002
;               rightTorqueArray[j] = j; }
                        ; SOURCE LINE # 52
        MOV     A,#LOW (rightTorqueArray)
        ADD     A,R7
        MOV     DPL,A
        CLR     A
        ADDC    A,#HIGH (rightTorqueArray)
        MOV     DPH,A
        MOV     A,R7
        MOVX    @DPTR,A
        SJMP    ?C0001
?C0002:

We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.

I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.

The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.

Parents
  • We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable.

    If that "we" included aforementioned "assembly guy", you need a better person to fill that job. No, your array is not "treated as a 16-bit variable". It's treated as an array living in a 16-bit address space, because of either a) how you explicitly declared that array, or b) your choice of memory model, which had the same effect implicitly. Since you didn't show your array's declaration, nobody know which it is.

    That declaration is the actual reason this loop is so slow. You basically got exactly what you asked for --- you just didn't know how to ask for what you actually wanted.

    The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.

    That may have been counter-productive. #pragma SRC has a side effect of turning off some optimizations, because those are not really expressible in assembly.

    You really should look at list files from normal compilation, and/or post-linking, "absolute" list file instead.

Reply
  • We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable.

    If that "we" included aforementioned "assembly guy", you need a better person to fill that job. No, your array is not "treated as a 16-bit variable". It's treated as an array living in a 16-bit address space, because of either a) how you explicitly declared that array, or b) your choice of memory model, which had the same effect implicitly. Since you didn't show your array's declaration, nobody know which it is.

    That declaration is the actual reason this loop is so slow. You basically got exactly what you asked for --- you just didn't know how to ask for what you actually wanted.

    The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.

    That may have been counter-productive. #pragma SRC has a side effect of turning off some optimizations, because those are not really expressible in assembly.

    You really should look at list files from normal compilation, and/or post-linking, "absolute" list file instead.

Children
  • I did mention that the array was an unsigned char type. It is declared as follows

    xuint8 leftTorqueArray[10];

    xuint8 is defined as unsigned char xdata. I am not aware of other ways of declaring arrays.

    The for-mentioned guy was not the assembly guy. The assembly guy has programmed nearly all of our machines and they are rather complex.

    These lines


    result in

            ;       leftTorqueArray[j] = 5;
            MOV     A,#LOW (leftTorqueArray)
            ADD     A,R7
            MOV     DPL,A
            CLR     A
            ADDC    A,#HIGH (leftTorqueArray)
            MOV     DPH,A
            MOV     A,#05H
            MOVX    @DPTR,A
    
            ;       leftTorqueArray[5] = 5;
            MOV     DPTR,#leftTorqueArray+05H
            MOVX    @DPTR,A
    


    I don't understand why there is such a huge difference between

    leftTorqueArray[j] = 5;
    


    and

    leftTorqueArray[5] = 5;
    


    How can I make

    leftTorqueArray[j] = 5;
    


    as quick as the other?

    I have been told and I have read on several websites about that decrementing variables is quicker than incrementing them. That's why some of my for-loops run backwards instead of forwards.

    The difference in assembly code is

    ;       uint8 j;
    ;       for(j=0;j<10;j++){
                            ; SOURCE LINE # 51
    ;---- Variable 'j?040' assigned to Register 'R7' ----
            CLR     A
            MOV     R7,A
    ?C0001:
    ;               rightTorqueArray[j] = j; }
                            ; SOURCE LINE # 52
            MOV     A,#LOW (rightTorqueArray)
            ADD     A,R7
            MOV     DPL,A
            CLR     A
            ADDC    A,#HIGH (rightTorqueArray)
            MOV     DPH,A
            MOV     A,R7
            MOVX    @DPTR,A
            INC     R7
            MOV     A,R7
            CJNE    A,#0AH,?C0001
    ?C0002:
    
    
    ;       uint8 j=10;
                            ; SOURCE LINE # 50
    ;---- Variable 'j?040' assigned to Register 'R7' ----
    ;       for(j=0;j--;){
                            ; SOURCE LINE # 51
            CLR     A
            MOV     R7,A
    ?C0001:
            MOV     R6,AR7
            DEC     R7
            MOV     A,R6
            JZ      ?C0002
    ;               rightTorqueArray[j] = j; }
                            ; SOURCE LINE # 52
            MOV     A,#LOW (rightTorqueArray)
            ADD     A,R7
            MOV     DPL,A
            CLR     A
            ADDC    A,#HIGH (rightTorqueArray)
            MOV     DPH,A
            MOV     A,R7
            MOVX    @DPTR,A
            SJMP    ?C0001
    ?C0002:
    


    I honestly can't you tell which is actual faster. The code size is however 3 bytes smaller with incrementing for-loops.

  • Do you understand the meaning & significance of the xdata keyword in that?


    "I don't understand why there is such a huge difference between

    leftTorqueArray[j] = 5;
    

    and

    leftTorqueArray[5] = 5;
    

    "

    Thnk about it: one is a fixed address which can be computed at compile-time; the other has to be evaluated and accessed at runtime...

  • I have been told and I have read on several websites about that decrementing variables is quicker than incrementing them. That's why some of my for-loops run backwards instead of forwards.

    I remember reading that when I was at primary school of the programming. It is being wrong to think always the same. it is depending on the processor achitecture and situations.

    if you using the assembler with the 8051 a simple 8 bit decrement counter is being very efficient with the DJNZ instruction but the compiler cannot always use it.

    To writing efficient code of the 8051 you must goodly understand the processor. it is old and it is lots of limitations. The KEIL C51 compiler is surprising good but it is not magic and cannot make your bad understanding code very fast. you must help to make it good code.

  • "To writing efficient code of the 8051 you must goodly understand the processor. it is old and it is lots of limitations. The KEIL C51 compiler is surprising good but it is not magic and cannot make your bad understanding code very fast. you must help to make it good code."

    @Kalib

    I am asking for precisely this. I am specifcally asking you guys to help me to help the keil C51 to produce better code.

    Can I make adressing array elements faster? And if so, how can I?

  • Fundamental to answering that question is understanding the meaning & significance of the xdata keyword - hence my question above.

    Surely, your "assembly guy" can help you with this ... ?

    Have you looked at Appendix C, "Writing Optimum Code", in the Compiler manual ?

  • try incrementing 'j' (I HATE these nondescriptive variables) there is as stated above an INC DPTR and I would hope it is used by the compiler.

  • Not looking at the assembler posted above.

    It is computing the DPTR value for each individual element access - rather than pointing the DPTR at the start of the array, and then incrementing it for each element.

    This is where the "inefficiency" lies here.

    Maybe incrementing a pointer - rather than using an index - would yield better results in this case ... ?

  • The absolutely foremost cause of inefficiency is that it's an XDATA array. Once that decision has been made, worrying about execution speed is, to a large extent, just a waste of time.

  • @Neil The "assembly guy" cannot help, he is 66 years, worked his entire live at this company and most importantly: he does not know anything about C or Keil. The one programming language he ever learned is assembly.

    Well a funny thing. If I remove xdata there is no difference in the ammount of assembly lines. We (me and he co-worker who isn't the assembly guy) already suspected it.

    We only have 64 bytes memory for non-xdata memory and over 65kB of xdata, which I personally call "unlimited variable memory".

    Co-worker already tried using pointers but his attempt produced the same problem. According to the link I sent: Optimal C Constructs for 8051 Microcontrollers by Nigel Jones.

    Accessing xdata via for-loop supposed to be faster than using pointers. I will however examin appendix C first. See what that files has to say over the matter

    I have another smaller question. For my software timers I removed most loops and I used constants instead. It uses more code but should have a quicker execution timer. Most of the arrays are still members of structs. Does it benefit if I were to remove the structs and use separate arrays instead?

  • But, surely, he can explain the assembler to you? And explain why this is inherent in the 8051 architecture?

    "Well a funny thing. If I remove xdata there is no difference in the ammount of assembly lines"

    That's not funny at all - it is to be expected if you are using a Memory Model which defaults to XDATA.

    As already noted, XDATA memory access is inherently slow & "inefficient".

    "Co-worker already tried using pointers but his attempt produced the same problem"

    So show what, exactly, he tried - and what, exactly, was the result.

    As the "inefficiency" is inherent in the underlying hardware architecture, it is bound to happen!

    "Accessing xdata via for-loop supposed to be faster than using pointers"

    Says who?

    "Does it benefit if I were to remove the structs and use separate arrays instead?"

    Most likely not. More detail needed to answer specifically.

  • 
    void test(void)
    {
       uint8 rightTorqueArray[10], j;
          for(j = 10;j > 0; --j)
          {
             rightTorqueArray[j] = j;
          }
    }
    /////////////////////////////////////////////
        88: void test(void)
        89: {
        90:    uint8 rightTorqueArray[10], j;
    
        91:       for(j = 10; j > 0; --j)
    C:0x199C    7F0A     MOV      R7,#0x0A
        92:       {
        93:          rightTorqueArray[j] = j;
    C:0x199E    7453     MOV      A,#0x53
    C:0x19A0    2F       ADD      A,R7
    C:0x19A1    F8       MOV      R0,A
    C:0x19A2    A607     MOV      @R0,0x07
        93:       }
    C:0x19A4    DFF8     DJNZ     R7,C:199E
    C:0x19A6    22       RET
    

  • But that example is not using XDATA - is it?

  • Yes, it is in data memory.
    It was done intentionally, to demonstrate different memory and access type in x51.
    It was not any restriction for it in the beginning of the topic.

  • OK - That's true.

    A fundamental problem seems to be that the OP hasn't understood (or didn't originally understand) the implications of using XDATA - hence my early question about the significance of the xdata keyword.