I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.
Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:
uint8 j; for(j=10;j--;){ rightTorqueArray[j] = j; }
The array is an unsigned char array but when we observe the assembly
MOV R7,#0AH ?C0001: MOV R6,AR7 DEC R7 MOV A,R6 JZ ?C0002 ; rightTorqueArray[j] = j; } ; SOURCE LINE # 52 MOV A,#LOW (rightTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (rightTorqueArray) MOV DPH,A MOV A,R7 MOVX @DPTR,A SJMP ?C0001 ?C0002:
We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.
I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.
The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.
But that example is not using XDATA - is it?
Yes, it is in data memory. It was done intentionally, to demonstrate different memory and access type in x51. It was not any restriction for it in the beginning of the topic.
OK - That's true.
A fundamental problem seems to be that the OP hasn't understood (or didn't originally understand) the implications of using XDATA - hence my early question about the significance of the xdata keyword.
if the above code is the actual code, then the array actually contains fixed values over time, so it can be used as a code space array with predetermined byte values.
It isn't - the OP stated: "As a test I wrote these lines in code".
"the array actually contains fixed values over time, so it can be used as a code space array with predetermined byte values"
Indeed - Although fetching from CODE space isn't necessarily very efficient, either.
In fact, you wouldn't even need the array at all - as each element value is equal to its index, you could just use the index!
But, as noted, it's not the real code anyhow.
This is the torque measurement function with the old way using arrays and my attempt to make it quicker. One problem is that I cannot reliably monitor the execution time of any code on the oscilloscope because of an opto-coupler circuit. By toggling an output I can measure cycle times by counting the time between 2 up going flanks and devide it by 2 but to count the 'high time' of one block of code cannot be done.
I wrote a detailed description of what the function does, but this forum claimed I used a certain spam word SxAxLxE even though I could not find the string in question.
void updateTorque() { uint8 j=0; static uint8 firstPollLeft, secondPollLeft, thirdPollLeft, fourthPollLeft, firstPollRight, secondPollRight,thirdPollRight,fourthPollRight; uint16 totalTorqueLeft=0, totalTorqueRight=0; // OLD CODE USING ARRAYS leftTorqueArray[torqueIndex] = leftServoTorque; // stores ADC values in array rightTorqueArray[torqueIndex] = rightServoTorque; do { totalTorqueLeft += leftTorqueArray[j]; totalTorqueRight += rightTorqueArray[j]; j++; } while(j<SAMPLE_AMMOUNT); torqueIndex++; if(torqueIndex==SAMPLE_AMMOUNT) torqueIndex=0; // keeps track of the index so it remains 0 - 3*/ // NEW CODE WITHOUT ARRAYS switch(torqueIndex){ case 0: firstPollLeft = leftServoTorque; firstPollRight = rightServoTorque; break; case 1: secondPollLeft = leftServoTorque; secondPollRight = rightServoTorque; break; case 2: thirdPollLeft = leftServoTorque; thirdPollRight = rightServoTorque; break; case 3: fourthPollLeft = leftServoTorque; fourthPollRight = rightServoTorque; break;} totalTorqueLeft = firstPollLeft + secondPollLeft + thirdPollLeft + fourthPollLeft; totalTorqueRight = firstPollRight + secondPollRight + thirdPollRight + fourthPollRight; if(monitorLeftTorque==true) torqueLeft = totalTorqueLeft * 25 / TORQUEFACTORLEFT; // if we need to monitor torque values.. else torqueLeft = 0; if(monitorRightTorque==true) torqueRight = totalTorqueRight * 25 / TORQUEFACTORRIGHT; else torqueRight = 0; }
Yes, this forum's spam handling is complete rubbish!
do { totalTorqueLeft += leftTorqueArray[j]; totalTorqueRight += rightTorqueArray[j]; j++; } while(j<SAMPLE_AMMOUNT);
So here you have not only one but two XDATA arrays!
Unless you have 2 DPTRs, that means the compiler is going to have to keep reloading the DPTR - so bound to be very "inefficient".
I guess you could try:
uint8 data j; // Index in DATA space uint8 xdata * data pTorque; // Pointer in DATA to items in XDATA j = SAMPLE_AMMOUNT; pTorque = leftTorqueArray; do { totalTorqueLeft += *pTorque++; j--; } while(j>0); j = SAMPLE_AMMOUNT; pTorque = rightTorqueArray; do { totalTorqueRight += *pTorque++; j--; } while(j>0);
You should also have the sum variables - totalTorqueLeft and totalTorqueRight in DATA.
If performance remains an issue, get your assembler guy to write it in assembler for you, and explain it. Then call his assembler from your 'C'.
Here's how to create a 'C'-callable assembler function: www.8052mcu.com/.../149030
The code may be a little bit faster, if arrays will be located in pdata memory (first 256-bytes sector of XDATA). Addressing mode via @R0 and @R1 may be used by compiler.
Indeed - good one!
Although this is probably another area where the OP will need help from the "Assembly Guy" (to explain the concept & operation; not write the code).
See http://www.keil.com/support/docs/1848.htm for starters.
It might even be that each of the arrays could be given its own page in PDATA ...
Rather than collecting the data into arrays, and then running through those arrays to sum them - could you not just sum the data as it arrives ... ?
I could but I'd have to substract the value of the sample from 3 cycles ago. So I still have to memorize all 4 samples either way.
The torque measurement is a constant process. One of the four samples get swapped for a new sample, and than the calculation over the 4 samples must be done.
I have yet to try out the pointers but I am currently busy with making some other changes.
Are you doing a running average, then?
Here's an implementation which doesn't require keeping the old samples:
www.daycounter.com/.../Moving-Average.phtml
Interesting mathematics behind it. But I cannot imagine that the execution of that method is actually faster at least not significant. At the moment I have to take 1 sample and add 4 unsigned char variables to 1 unsigned int variable before the calculation. From the description of that link I have to do 1 division, 1 substraction and 2 shift operations. Because I doubted it would be quicker, I did not translated the calculation to C.
Currently I use a switch case with 4 cases for each sample (x2). From what I learned here and from what I see in the assembly output is that this method is relative quick. The addresses are fixed at compile time and that makes a difference.
switch(torqueIndex){ case 0: firstPollLeft = leftServoTorque; firstPollRight = rightServoTorque; break; case 1: secondPollLeft = leftServoTorque; secondPollRight = rightServoTorque; break; case 2: thirdPollLeft = leftServoTorque; thirdPollRight = rightServoTorque; break; case 3: fourthPollLeft = leftServoTorque; fourthPollRight = rightServoTorque; break;} torqueIndex++; if(torqueIndex==SAMPLE_AMMOUNT) torqueIndex=0;
You may well be right.
As you're only taking 8 samples, can you fit them in DATA? That would certainly be faster than XDATA ...
Or PDATA?