This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Compiler produces inefficient assembly code?

I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.

Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:

uint8 j;
        for(j=10;j--;){
                rightTorqueArray[j] = j; }

The array is an unsigned char array but when we observe the assembly

        MOV     R7,#0AH
?C0001:
        MOV     R6,AR7
        DEC     R7
        MOV     A,R6
        JZ      ?C0002
;               rightTorqueArray[j] = j; }
                        ; SOURCE LINE # 52
        MOV     A,#LOW (rightTorqueArray)
        ADD     A,R7
        MOV     DPL,A
        CLR     A
        ADDC    A,#HIGH (rightTorqueArray)
        MOV     DPH,A
        MOV     A,R7
        MOVX    @DPTR,A
        SJMP    ?C0001
?C0002:

We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.

I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.

The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.

Parents

0 Andrey Shemet over 7 years ago in reply to Andy Neil

The code may be a little bit faster, if arrays will be located in pdata memory (first 256-bytes sector of XDATA). Addressing mode via @R0 and @R1 may be used by compiler.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Andrey Shemet over 7 years ago in reply to Andy Neil

The code may be a little bit faster, if arrays will be located in pdata memory (first 256-bytes sector of XDATA). Addressing mode via @R0 and @R1 may be used by compiler.
Cancel
Vote up 0 Vote down

Cancel

Children

0 Andy Neil over 7 years ago in reply to Andrey Shemet

Indeed - good one!

Although this is probably another area where the OP will need help from the "Assembly Guy" (to explain the concept & operation; not write the code).

See http://www.keil.com/support/docs/1848.htm for starters.

It might even be that each of the arrays could be given its own page in PDATA ...
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 7 years ago in reply to Andy Neil

Rather than collecting the data into arrays, and then running through those arrays to sum them - could you not just sum the data as it arrives ... ?
Cancel
Vote up 0 Vote down

Cancel
0 sebastiaan knippels over 7 years ago in reply to Andy Neil

I could but I'd have to substract the value of the sample from 3 cycles ago. So I still have to memorize all 4 samples either way.

The torque measurement is a constant process. One of the four samples get swapped for a new sample, and than the calculation over the 4 samples must be done.

I have yet to try out the pointers but I am currently busy with making some other changes.
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 7 years ago in reply to sebastiaan knippels

Are you doing a running average, then?

Here's an implementation which doesn't require keeping the old samples:

www.daycounter.com/.../Moving-Average.phtml
Cancel
Vote up 0 Vote down

Cancel
0 sebastiaan knippels over 7 years ago in reply to Andy Neil
Interesting mathematics behind it. But I cannot imagine that the execution of that method is actually faster at least not significant. At the moment I have to take 1 sample and add 4 unsigned char variables to 1 unsigned int variable before the calculation. From the description of that link I have to do 1 division, 1 substraction and 2 shift operations. Because I doubted it would be quicker, I did not translated the calculation to C.

Currently I use a switch case with 4 cases for each sample (x2). From what I learned here and from what I see in the assembly output is that this method is relative quick. The addresses are fixed at compile time and that makes a difference.

switch(torqueIndex){ case 0: firstPollLeft = leftServoTorque; firstPollRight = rightServoTorque; break; case 1: secondPollLeft = leftServoTorque; secondPollRight = rightServoTorque; break; case 2: thirdPollLeft = leftServoTorque; thirdPollRight = rightServoTorque; break; case 3: fourthPollLeft = leftServoTorque; fourthPollRight = rightServoTorque; break;} torqueIndex++; if(torqueIndex==SAMPLE_AMMOUNT) torqueIndex=0;
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 7 years ago in reply to sebastiaan knippels

You may well be right.

As you're only taking 8 samples, can you fit them in DATA? That would certainly be faster than XDATA ...

Or PDATA?
Cancel
Vote up 0 Vote down

Cancel