Has someone a good hint how to implement a parity-calculation of a byte in the most efficient way on an C51-device?
"excellent example of the cost of portability"
Actually, it's a poor example. Some other example would demonstrate the cost much better. On the non-8051 architecture where this is used, the expression compiles to the same 6-instruction sequence as the hand optimized assembly.
I am away without access to my C51 toolchain at the moment to include the compiler-generated assembly, but you will find with the http://www.keil.com/support/docs/1619.htm and its function parameter and call/return overhead, that the non-portable function and portable expression versions compare quite closely.
That's why if you want to shave off a few bytes and cycles, you should write it assembly, but do it inline within a larger assembly module that uses the parity, otherwise you've still got the overhead of a function call without benefit.
Dan, you are right; however I see a possible misunderstanding of your stetement and thus, without any malice, I correct it
"but do it within a larger assembly routine that uses the parity"
On my first read I read 'inline' as "inline assembly in a C module"
Erik
Once again I agree substantially with what is said above about mixed-mode development and non-portable implementation.
However, there are some things that simply don't justify the cost of a function call. One of such things is exactly the parity function in a '51 derivative.
If you are in C, you can get the parity in 2 machine instructions:
d = 0x54; if(ACC=d, P) // if odd parity { d = 0x55; par = (ACC=d, P); // par is 1 for odd, 0 for even. }
The expression (ACC=value, P) uses the comma operator to guarantee that the accumulator will not be dirty before testing the PSW.P bit. It is an ancient C trick to force the compiler to perform low-level operations in a certain sequence.
Albeit not 'portable', it is ANSI, and that use of the comma operator is guaranteed not to be optimized by the compiler.
It produces a parity evaluation in 2 machine instructions. I doubt that this can be done faster in any other implementation.
The comma operator is perfect for yet another nice trick: update a 16bit timer in C, by adding the 16bit value of the period constant to the timer registers:
/////////////////////////////////////////////////////////////////////////// // The following macros reload the 1ms timer, compensating for the interrupt // latency to achieve an average frequency of 1ms, corrected at every period. // // The code uses a few C tricks: // - Accesses the PSW carry directly, to obtain the overflow from an unsigned // char addition. We use the comma operator for that, to perform a side effect // addition right before of a carry bit test, not allowing the compiler to // reorder the code and make the carry dirty. // The comma operator is used also to insert padding NOPs before the addition, // in the cases that the lower byte of the adjust value is {0,1,2}, because the // C51 compiler generates INC instead, or suppresses the operation. // // This code was verified with optimization levels from 0 to 9, and the compiler // doesn't try to break the code with optimizations. // // The whole procedure is wrapped in a function-like macro. // // This code was verified in a P89C668 @18.432MHz, and the frequency deviation // measured was ±20ppm, essentially the cpu crystal variation, completely // removing the timer interrupt latency. /////////////////////////////////////////////////////////////////////////// #ifdef ADJ_DELAY #undef ADJ_DELAY #endif #ifdef RELOAD_TIMER0 #undef RELOAD_TIMER0 #endif #if (((TMR_1MS & 0xff) < (0xff-8)) || ((TMR_1MS & 0xff) > (0xff-6))) #define ADJ_DELAY 9 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-6)) #define ADJ_DELAY 10 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-7)) #define ADJ_DELAY 11 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-8)) #define ADJ_DELAY 12 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), _nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #endif
In the timer handler, a call to RELOAD_TIMER() invokes the macro, that is synthesized for any reload value.
Another use of the comma operator: you can get the 16bit result from the multiplication of 2 unsigned char without any call to the math clib:
unsigned char data a, b; unsigned int data x; a = 36; b = 130; // this is 400% faster than a x = (unsigned int) a * b *(unsigned char*)&x = ((((unsigned char*)&x)[1] = a * b), B);
I am just too lazy: you can make the int variable a union to access the msb and lsb parts, to improve readability, but the generated code is the same: 11 instruction cycles against 40 cycles using a 16bit cast.