This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Compiler produces inefficient assembly code?

I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.

Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:

uint8 j;
        for(j=10;j--;){
                rightTorqueArray[j] = j; }


The array is an unsigned char array but when we observe the assembly

        MOV     R7,#0AH
?C0001:
        MOV     R6,AR7
        DEC     R7
        MOV     A,R6
        JZ      ?C0002
;               rightTorqueArray[j] = j; }
                        ; SOURCE LINE # 52
        MOV     A,#LOW (rightTorqueArray)
        ADD     A,R7
        MOV     DPL,A
        CLR     A
        ADDC    A,#HIGH (rightTorqueArray)
        MOV     DPH,A
        MOV     A,R7
        MOVX    @DPTR,A
        SJMP    ?C0001
?C0002:

We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.

I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.

The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.

Parents Reply Children
  • Rather than collecting the data into arrays, and then running through those arrays to sum them - could you not just sum the data as it arrives ... ?

  • I could but I'd have to substract the value of the sample from 3 cycles ago. So I still have to memorize all 4 samples either way.

    The torque measurement is a constant process. One of the four samples get swapped for a new sample, and than the calculation over the 4 samples must be done.

    I have yet to try out the pointers but I am currently busy with making some other changes.

  • Are you doing a running average, then?

    Here's an implementation which doesn't require keeping the old samples:

    www.daycounter.com/.../Moving-Average.phtml

  • Interesting mathematics behind it. But I cannot imagine that the execution of that method is actually faster at least not significant. At the moment I have to take 1 sample and add 4 unsigned char variables to 1 unsigned int variable before the calculation. From the description of that link I have to do 1 division, 1 substraction and 2 shift operations. Because I doubted it would be quicker, I did not translated the calculation to C.

    Currently I use a switch case with 4 cases for each sample (x2). From what I learned here and from what I see in the assembly output is that this method is relative quick. The addresses are fixed at compile time and that makes a difference.

    switch(torqueIndex){
                    case 0: firstPollLeft   = leftServoTorque;      firstPollRight  = rightServoTorque; break;
                    case 1: secondPollLeft  = leftServoTorque;      secondPollRight = rightServoTorque; break;
                    case 2: thirdPollLeft   = leftServoTorque;      thirdPollRight  = rightServoTorque; break;
                    case 3: fourthPollLeft  = leftServoTorque;      fourthPollRight = rightServoTorque; break;}
            torqueIndex++;
            if(torqueIndex==SAMPLE_AMMOUNT) torqueIndex=0;
    

  • You may well be right.

    As you're only taking 8 samples, can you fit them in DATA? That would certainly be faster than XDATA ...

    Or PDATA?