This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

C51 - Calculating the parity

Chris s over 18 years ago

Has someone a good hint how to implement a parity-calculation of a byte in the most efficient way on an C51-device?

Parents

0 Dan Henry over 18 years ago in reply to erik malund

"excellent example of the cost of portability"

Actually, it's a poor example. Some other example would demonstrate the cost much better. On the non-8051 architecture where this is used, the expression compiles to the same 6-instruction sequence as the hand optimized assembly.

I am away without access to my C51 toolchain at the moment to include the compiler-generated assembly, but you will find with the http://www.keil.com/support/docs/1619.htm and its function parameter and call/return overhead, that the non-portable function and portable expression versions compare quite closely.

That's why if you want to shave off a few bytes and cycles, you should write it assembly, but do it inline within a larger assembly module that uses the parity, otherwise you've still got the overhead of a function call without benefit.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 Dan Henry over 18 years ago in reply to erik malund

"excellent example of the cost of portability"

Actually, it's a poor example. Some other example would demonstrate the cost much better. On the non-8051 architecture where this is used, the expression compiles to the same 6-instruction sequence as the hand optimized assembly.

I am away without access to my C51 toolchain at the moment to include the compiler-generated assembly, but you will find with the http://www.keil.com/support/docs/1619.htm and its function parameter and call/return overhead, that the non-portable function and portable expression versions compare quite closely.

That's why if you want to shave off a few bytes and cycles, you should write it assembly, but do it inline within a larger assembly module that uses the parity, otherwise you've still got the overhead of a function call without benefit.
Cancel
Vote up 0 Vote down

Cancel

Children

0 erik malund over 18 years ago in reply to Dan Henry

That's why if you want to shave off a few bytes and cycles, you should write it assembly, but do it inline within a larger assembly module that uses the parity, otherwise you've still got the overhead of a function call without benefit.

Dan, you are right; however I see a possible misunderstanding of your stetement and thus, without any malice, I correct it

"but do it within a larger assembly routine that uses the parity"

On my first read I read 'inline' as "inline assembly in a C module"

Erik
Cancel
Vote up 0 Vote down

Cancel
0 Jonny Doin over 18 years ago in reply to erik malund
Once again I agree substantially with what is said above about mixed-mode development and non-portable implementation.

However, there are some things that simply don't justify the cost of a function call. One of such things is exactly the parity function in a '51 derivative.

If you are in C, you can get the parity in 2 machine instructions:

d = 0x54; if(ACC=d, P) // if odd parity { d = 0x55; par = (ACC=d, P); // par is 1 for odd, 0 for even. }

The expression (ACC=value, P) uses the comma operator to guarantee that the accumulator will not be dirty before testing the PSW.P bit. It is an ancient C trick to force the compiler to perform low-level operations in a certain sequence.

Albeit not 'portable', it is ANSI, and that use of the comma operator is guaranteed not to be optimized by the compiler.

It produces a parity evaluation in 2 machine instructions. I doubt that this can be done faster in any other implementation.
Cancel
Vote up 0 Vote down

Cancel

0 Jonny Doin over 18 years ago in reply to Jonny Doin

The comma operator is perfect for yet another nice trick: update a 16bit timer in C, by adding the 16bit value of the period constant to the timer registers:

///////////////////////////////////////////////////////////////////////////
// The following macros reload the 1ms timer, compensating for the interrupt
// latency to achieve an average frequency of 1ms, corrected at every period.
//
// The code uses a few C tricks:
// - Accesses the PSW carry directly, to obtain the overflow from an unsigned
// char addition. We use the comma operator for that, to perform a side effect
// addition right before of a carry bit test, not allowing the compiler to
// reorder the code and make the carry dirty.
// The comma operator is used also to insert padding NOPs before the addition,
// in the cases that the lower byte of the adjust value is {0,1,2}, because the
// C51 compiler generates INC instead, or suppresses the operation.
//
// This code was verified with optimization levels from 0 to 9, and the compiler
// doesn't try to break the code with optimizations.
//
// The whole procedure is wrapped in a function-like macro.
//
// This code was verified in a P89C668 @18.432MHz, and the frequency deviation
// measured was ±20ppm, essentially the cpu crystal variation, completely
// removing the timer interrupt latency.
///////////////////////////////////////////////////////////////////////////

#ifdef ADJ_DELAY
#undef ADJ_DELAY
#endif

#ifdef RELOAD_TIMER0
#undef RELOAD_TIMER0
#endif

#if (((TMR_1MS & 0xff) < (0xff-8)) || ((TMR_1MS & 0xff) > (0xff-6)))
#define ADJ_DELAY 9
#define RELOAD_TIMER0()     do {                            \ 
                                TR0 = 0;                    \ 
                                if (TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++;   \ 
                                TH0 += (TMR_1MS >> 8);      \ 
                                TR0 = 1;                    \ 
                            } while(0)
#elif ((TMR_1MS & 0xff) == (0xff-6))
#define ADJ_DELAY 10
#define RELOAD_TIMER0()     do {                            \ 
                                TR0 = 0;                    \ 
                                if (_nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++;  \ 
                                TH0 += (TMR_1MS >> 8);      \ 
                                TR0 = 1;                    \ 
                            } while(0)
#elif ((TMR_1MS & 0xff) == (0xff-7))
#define ADJ_DELAY 11
#define RELOAD_TIMER0()     do {                            \ 
                                TR0 = 0;                    \ 
                                if (_nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++; \ 
                                TH0 += (TMR_1MS >> 8);      \ 
                                TR0 = 1;                    \ 
                            } while(0)
#elif ((TMR_1MS & 0xff) == (0xff-8))
#define ADJ_DELAY 12
#define RELOAD_TIMER0()     do {                            \ 
                                TR0 = 0;                    \ 
                                if (_nop_(), _nop_(), _nop_(), TL0 += ((TMR_1MS & 0xff) + ADJ_DELAY), CY) TH0++;    \ 
                                TH0 += (TMR_1MS >> 8);      \ 
                                TR0 = 1;                    \ 
                            } while(0)
#endif

In the timer handler, a call to RELOAD_TIMER() invokes the macro, that is synthesized for any reload value.

0 Jonny Doin over 18 years ago in reply to Jonny Doin
Another use of the comma operator: you can get the 16bit result from the multiplication of 2 unsigned char without any call to the math clib:

unsigned char data a, b; unsigned int data x; a = 36; b = 130; // this is 400% faster than a x = (unsigned int) a * b *(unsigned char*)&x = ((((unsigned char*)&x)[1] = a * b), B);

I am just too lazy: you can make the int variable a union to access the msb and lsb parts, to improve readability, but the generated code is the same: 11 instruction cycles against 40 cycles using a 16bit cast.
Cancel
Vote up 0 Vote down

Cancel