I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.
Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:
uint8 j; for(j=10;j--;){ rightTorqueArray[j] = j; }
The array is an unsigned char array but when we observe the assembly
MOV R7,#0AH ?C0001: MOV R6,AR7 DEC R7 MOV A,R6 JZ ?C0002 ; rightTorqueArray[j] = j; } ; SOURCE LINE # 52 MOV A,#LOW (rightTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (rightTorqueArray) MOV DPH,A MOV A,R7 MOVX @DPTR,A SJMP ?C0001 ?C0002:
We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.
I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.
The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.
I could but I'd have to substract the value of the sample from 3 cycles ago. So I still have to memorize all 4 samples either way.
The torque measurement is a constant process. One of the four samples get swapped for a new sample, and than the calculation over the 4 samples must be done.
I have yet to try out the pointers but I am currently busy with making some other changes.
Are you doing a running average, then?
Here's an implementation which doesn't require keeping the old samples:
www.daycounter.com/.../Moving-Average.phtml
Interesting mathematics behind it. But I cannot imagine that the execution of that method is actually faster at least not significant. At the moment I have to take 1 sample and add 4 unsigned char variables to 1 unsigned int variable before the calculation. From the description of that link I have to do 1 division, 1 substraction and 2 shift operations. Because I doubted it would be quicker, I did not translated the calculation to C.
Currently I use a switch case with 4 cases for each sample (x2). From what I learned here and from what I see in the assembly output is that this method is relative quick. The addresses are fixed at compile time and that makes a difference.
switch(torqueIndex){ case 0: firstPollLeft = leftServoTorque; firstPollRight = rightServoTorque; break; case 1: secondPollLeft = leftServoTorque; secondPollRight = rightServoTorque; break; case 2: thirdPollLeft = leftServoTorque; thirdPollRight = rightServoTorque; break; case 3: fourthPollLeft = leftServoTorque; fourthPollRight = rightServoTorque; break;} torqueIndex++; if(torqueIndex==SAMPLE_AMMOUNT) torqueIndex=0;
You may well be right.
As you're only taking 8 samples, can you fit them in DATA? That would certainly be faster than XDATA ...
Or PDATA?