elegante inversion.
/*------------------------------------------------------------------------------ INVERSION DE BYTE 8 BIT, LSB -> MSB ------------------------------------------------------------------------------*/ unsigned char mr; unsigned char invertir_byte (mr) { mr = (mr & 0x0F) << 4 | (mr & 0xF0) >> 4; mr = (mr & 0x33) << 2 | (mr & 0xCC) >> 2; mr = (mr & 0x55) << 1 | (mr & 0xAA) >> 1; return (mr); }
#include <reg52.h> unsigned char mr; unsigned char invertir_byte (mr) { mr = (mr & 0x0F) << 4 | (mr & 0xF0) >> 4; mr = (mr & 0x33) << 2 | (mr & 0xCC) >> 2; mr = (mr & 0x55) << 1 | (mr & 0xAA) >> 1; return (mr); } void main() { while(1) { P1=invertir_byte(0x33); } }
Program Size: data=10.0 xdata=0 code=123 It's spend 121 clock cycles
#include <reg52.h> unsigned char mr; unsigned char invertir_byte (mr) { unsigned char temp; if(mr&0x80){temp=temp|0x01;} if(mr&0x40){temp=temp|0x02;} if(mr&0x20){temp=temp|0x04;} if(mr&0x10){temp=temp|0x08;} if(mr&0x08){temp=temp|0x10;} if(mr&0x04){temp=temp|0x20;} if(mr&0x02){temp=temp|0x40;} if(mr&0x01){temp=temp|0x80;} return (temp); } void main() { while(1) { P1=invertir_byte(0x33); } }
Program Size: data=10.0 xdata=0 code=85 It's spend 42 clock cycles
As noted earlier, the original "beautiful code" is really great on the correct platform.
It likes barrel shifters where each shift operation takes a fixed number of clock cycles independent on number of shift steps.
It likes multiple ALU, allowing the operations to be performed concurrently before the final merge of the results.
It is free from conditional jumps, avoiding branch prediction failures in high-end processors.
It does not require the processor to have special bit instructions to operate on single bits, like the 8051 has.
A normal 8051 doesn't have a barrel shifter. And it doesn't have multiple concurrent ALU. And it does not have a pipeline but advanced branch prediction, where a failed prediction may cost many concurrent instructions. Even the fast one-clockers sees limited loss from a branch prediction failure.