Has someone a good hint how to implement a parity-calculation of a byte in the most efficient way on an C51-device?
implement a parity-calculation of a byte in the most efficient way on an C51-device convert that routine to assembler, in assembler you have a parity bit in the PSW. That bit is not accessible in C since C is not register specific
Erik
"That bit is not accessible in C since C is not register specific"
The bit is accessible - the doubt is whether it relates to what you want at the time you want it...
Several people have suggested doing this in C51 - just do the Search to find it. You may be lucky, but I'd put it in an assembler function.
http://www.keil.com/support/docs/1619.htm
Great, thanks guys!
Hi,
Just have dealt with this problem recently.
May I share what I learned. One must calculate the parity right before using it. The ACC is a very busy fellow.
Ed.
"One must calculate the parity right before using it. The ACC is a very busy fellow"
Yes - that's why I urged caution in doing this in 'C':
"the doubt is whether it [the parity flag] relates to what you want at the time you want it..."
I still say assembler is preferable...
absolutely
Relying on a register in C is, at best, a kludge. Many will not use assembler because they believe that 'portability' is 'interesting' for code for small embedded, BALONEY. The only reason for using C with small embedded is coding and maintenance ease. Of course, a C function not using any I/O for small embedded is somewhat portable, but the C postulate that the whole shebang is portable is totally invalid for small embedded.
BTW how would ACC = Ralph; george = P; ever be portable.
PS if this is because you can not write it in assembler, I pity you.
"I still say assembler is preferable..."
Agreed.
If using C and without relying on results being in particular registers, the expressions below yield a parity bit in bit 7 for OR'ing with 7-bit data.
#define EVEN_PARITY(b) ((((((b)^(((b)<<4)|((b)>>4)))+0x41)|0x7C)+2)&0x80) #define ODD_PARITY(b) (EVEN_PARITY(b)^0x80)
excellent example of the cost of portability. Many that have more concern for 'purity' (which, probably, is a great idea for PC code) end up with high hardware costs, simply because they do not ralize that, in small embedded, efficiency is the name of the game.
"excellent example of the cost of portability"
Actually, it's a poor example. Some other example would demonstrate the cost much better. On the non-8051 architecture where this is used, the expression compiles to the same 6-instruction sequence as the hand optimized assembly.
I am away without access to my C51 toolchain at the moment to include the compiler-generated assembly, but you will find with the http://www.keil.com/support/docs/1619.htm and its function parameter and call/return overhead, that the non-portable function and portable expression versions compare quite closely.
That's why if you want to shave off a few bytes and cycles, you should write it assembly, but do it inline within a larger assembly module that uses the parity, otherwise you've still got the overhead of a function call without benefit.
Dan, you are right; however I see a possible misunderstanding of your stetement and thus, without any malice, I correct it
"but do it within a larger assembly routine that uses the parity"
On my first read I read 'inline' as "inline assembly in a C module"
Once again I agree substantially with what is said above about mixed-mode development and non-portable implementation.
However, there are some things that simply don't justify the cost of a function call. One of such things is exactly the parity function in a '51 derivative.
If you are in C, you can get the parity in 2 machine instructions:
d = 0x54; if(ACC=d, P) // if odd parity { d = 0x55; par = (ACC=d, P); // par is 1 for odd, 0 for even. }
The expression (ACC=value, P) uses the comma operator to guarantee that the accumulator will not be dirty before testing the PSW.P bit. It is an ancient C trick to force the compiler to perform low-level operations in a certain sequence.
Albeit not 'portable', it is ANSI, and that use of the comma operator is guaranteed not to be optimized by the compiler.
It produces a parity evaluation in 2 machine instructions. I doubt that this can be done faster in any other implementation.
The comma operator is perfect for yet another nice trick: update a 16bit timer in C, by adding the 16bit value of the period constant to the timer registers:
/////////////////////////////////////////////////////////////////////////// // The following macros reload the 1ms timer, compensating for the interrupt // latency to achieve an average frequency of 1ms, corrected at every period. // // The code uses a few C tricks: // - Accesses the PSW carry directly, to obtain the overflow from an unsigned // char addition. We use the comma operator for that, to perform a side effect // addition right before of a carry bit test, not allowing the compiler to // reorder the code and make the carry dirty. // The comma operator is used also to insert padding NOPs before the addition, // in the cases that the lower byte of the adjust value is {0,1,2}, because the // C51 compiler generates INC instead, or suppresses the operation. // // This code was verified with optimization levels from 0 to 9, and the compiler // doesn't try to break the code with optimizations. // // The whole procedure is wrapped in a function-like macro. // // This code was verified in a P89C668 @18.432MHz, and the frequency deviation // measured was ±20ppm, essentially the cpu crystal variation, completely // removing the timer interrupt latency. /////////////////////////////////////////////////////////////////////////// #ifdef ADJ_DELAY #undef ADJ_DELAY #endif #ifdef RELOAD_TIMER0 #undef RELOAD_TIMER0 #endif #if (((TMR_1MS & 0xff) < (0xff-8)) || ((TMR_1MS & 0xff) > (0xff-6))) #define ADJ_DELAY 9 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-6)) #define ADJ_DELAY 10 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-7)) #define ADJ_DELAY 11 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #elif ((TMR_1MS & 0xff) == (0xff-8)) #define ADJ_DELAY 12 #define RELOAD_TIMER0() do { \ TR0 = 0; \ if (_nop_(), _nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ TH0 += (TMR_1MS >> 8); \ TR0 = 1; \ } while(0) #endif
In the timer handler, a call to RELOAD_TIMER() invokes the macro, that is synthesized for any reload value.
Another use of the comma operator: you can get the 16bit result from the multiplication of 2 unsigned char without any call to the math clib:
unsigned char data a, b; unsigned int data x; a = 36; b = 130; // this is 400% faster than a x = (unsigned int) a * b *(unsigned char*)&x = ((((unsigned char*)&x)[1] = a * b), B);
I am just too lazy: you can make the int variable a union to access the msb and lsb parts, to improve readability, but the generated code is the same: 11 instruction cycles against 40 cycles using a 16bit cast.
Here's a simple one:
static char const code nibble_parity[16] = { 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0 }; char calc_even_parity(uchar byte) { return( nibble_parity[byte&0xf] ^ nibble_parity[(byte>>4)&0xf]); } char calc_odd_parity(uchar byte) { return(!calc_even_parity(byte)); }
I've been using it for years