First, I am using 7.5 so it is an old compiler, but that is the way it is.
I have a number of bit variables that I am trying to pack into a byte. they are scattered all across the bit area
volatile bit var1; volatile bit var2; volatile bit var3;
char result;
The compiler won't let me do
result= (var1 <<1) | (var2 <<2) | (var3 <<4); the only way I have been able to get this to work is result= var1; result= ((((result << 1) | var2)<<1) | var3)<<1); This results in very inefficient code mov a,result mov r7,a mov c,v2 clr a rlc A orl A,R7; add A,ACC mov a,r7 mov c,var3 rlc A orl A,R7 add A,ACC mov a,r7 etc. as compared to: mov A,result mov c,v2 rlc a mov c,v3 rlc a would be much better. Does anyone know how this might be forced in C, without resorting to assembly? I'm wondering if because the ACC and PSW are evenly divisible by 8, whether or not you can use those and explicitly use the _crol_ intrinsic to accomplish this. I have not had any luck so far. In this case memory efficiency is more important than speed.
What happens if you rewrite:
result= (var1 <<1) | (var2 <<2) | (var3 <<4);
into:
result = ((char)var1 <<1) | ((char)var2 <<2) | ((char)var3 <<4);
or:
result = 0; if (var1) result |= 2; if (var2) result |= 4; if (var3) result |= 8;
result = (var1 ? 2 : 0) | (var2 ? 4 : 0) | (var3 ? 8 : 0);
Since the variables are actually bit flags in the
bit mapped data area, I think converting them to chars will be messy, and large code space.
However This resulted in a reduction of 17 bytes.
WELL.... different code, but because it used a jnb jump chain, it compiled to the exact same size of code as originally the case. There is an OR of 0 on each test.
I have several of these and need to conserve code space.
Basically, it is a state dump of the medical device up to a secondary controller, so since I am doing both ends, I can do it pretty much however I want, but I just have about 1K of free flash at this point, so the more memory I can save the better.
The existing 8051 is a silabs F040 with 64K flash (except that they burn pages 0xFD00 - FFFF for manufacturing test and protection etc.
Thanks once again for the excellent suggestions.
Type casting to char might be messy. But ((char)bitx << 4) doesn't mean the compiler actually have to convert bitx to a char. But it means the compiler must produce a result as if it had - the language standard isn't about expected processor instructions but about expected end results.
A compiler that happens to have matching optimization rules could still perform single-bit operations on a char-sized temp variable that happens to also be bit-addressable. So something similar to the following - the assembler program has a byte-addressable variable in the bit-addressable region to allow bit operations to set individual bits: http://www.keil.com/support/docs/1877.htm
bdata unsigned char sample_result = 0; sbit var1 = sample_result ^ 0; sbit var2 = sample_result ^ 1; sbit var3 = sample_result ^ 2; main () { var1 = 0; var2 = 1; var3 = 1; sample_result = 0xFFu;
Generated code:
79: var1 = 0; C:0x070D C200 CLR var1(0x20.0) 80: var2 = 1; C:0x070F D201 SETB var2(0x20.1) 81: var3 = 1; 82: C:0x0711 D202 SETB var3(0x20.2) 83: sample_result = 0xFFu; 84: C:0x0713 7520FF MOV sample_result(0x20),#0xFF
bdata unsigned char sample_result = 0;
sbit var1 = sample_result ^ 0; sbit var2 = sample_result ^ 1; sbit var3 = sample_result ^ 2; volatile bit source_bit;
main () { var1 = 0; var2 = 1; var3 = 1;
sample_result=0xFF;
This is not what I am trying to accomplish.
The desired result is to PACK the 3 bits into positions in the result, which will be a bit mapped representation of the various bits scattered throughout bit (bdata) memory.
So if V1 =1; V2=0;v3=0 then result would be b00000100 or 0x04. so assigning sample_result = 0xFF would defeat the purpose.
I have about 17 bits free, so I tried it and it....
WORKS LIKE A BOSS! VERY nice. Thank you. generates a set of Var1=source_bit; mov c,source_bit mov (sample_result.0),c cut generated code down from x1B6 to x153 bytes. very significant savings.
"That's what bdata was invented for"
Yes, bdata is nice. But the question is if any C sequences will make the compiler itself make use of the concept with an intermediate variable, instead of forcing the developer to explicitly create a bdata tmp variable + 8 overlapping sbit variables.
Such as if any code sequence could get the compiler to make use of the accumulator, which just happens to be both bit and byte addressable - but which doesn't require the consumption of 8 dummy bits.
But maybe the Keil compiler has no such optimization rule, in which case it isn't possible to find any "magic" C sequence that will make the compiler in the background make a byte clear, multiple bit assigns and then store a byte to the final target address. Such an optimization rule would allow efficient bit packing to happen even for bytes that doesn't fit in bdata, without #define macros to hide all the sub-steps of the computation/assign.
By the way - it's rather brilliant how newer ARM Cortex chips emulates the bdata concept outside of the core in a language-neutral way.
cut generated code down from x1B6 to x153 bytes. very significant savings.
How much of that sentence would be appreciated by one of the multitude of C# fraternity?
Luv it :)
Well there was a wrinkle.
bdata unsigned char zz; sbit Bit7 ^ zz; ... compiler optimized things away. The compiler started by setting R7 to (state<<5) which is fine. Then the compiler issued a the correct bit set instructions. THEN it called sputchar passing R7 to it... Well this resulted in only the state being transmitted. zz= (state <<5); // take the state variable, and put it in the upper 3 bits of the byte Bit4=1; Bit5=0; sputchar(zz); So I changed the declaration to volatile bdata unsigned char zz; and got a syntax error near unsigned. And the Bit4=1; generates an invalid base address. WTF? Trying to find a way to force the compiler to reload zz into R7 before calling sputchar. In this case all worked fine.. zz=0; and followed by 7 Bitx = y statements. It reloaded ZZ into R7 before calling sputchar.... So something in the compiler is a bit to smart, gotta figure out how to force the load.... may have to try zz+=0; and see if that makes it reload.....
No, the compiler isn't too smart. It just fails to understand the aliasing between bit and byte - that there is an overlapping union of byte and bits. So the optimizer doesn't realizes that something have changed the byte variable, and thereby invalidating the value already in the register.
Will the compiler accept the volatile keyword if you move it after the bdata keyword?
I'll try that. what I did in the meantime, is to follow the last Bitx= line with one that does nothing, and consumes 2 or 3 bytes. sfrsave=SFRPAGE; Basically a data page move from an SFR. That forced the compiler to do explicit loads and saves. I'll try bdata volatile.. but I've always put volatile first.....
Well it swallowed that just fine, and I was able to take out the sfrsave=SFRPAGE line.
so why does volatile bit variable; work while volatile bdata unsigned char variable; does not work and bdata volatile unsigned char variable; DOES work.
That looks like a bug in the compiler to me.
pretty easy to demonstrate
volatile bdata unsigned char variable; sbit Bit7 = variable ^ 7; sbit Bit6 = variable ^6;
hope someone at keil will take note.
I'd actually like someone to verify whether or not this works on the current version of the compiler.
Might well be worth considering a small bit of assembler instead of trying to persuade the compiler to do something like this.
The order of attributes shouldn't matter, as long as we don't bring in pointers where the attribute binds to the pointer or the pointed-at value.
So volatile bdata xx and bdata volatile xx should both work.
But the optimizer really should also know about the aliasing between sbit and bdata since there is a hard-coded relation between them. And the whole idea with bdata is to get the aliasing effect, allowing the processor-specific one-bit instructions to be used. volatile isn't intended to solve aliasing issues, but to tell the compiler about asynchronous updates - i.e. that a task, ISR or actual hardware may change the variable state between two accesses.
I don't know if Keil has addressed this in later compilers, but
Per has it right.
The reason I tagged it volatile, was to force the compiler to do the load after the bit manipulations.
It worked in one order, but not in a different order. I think this is a bug.
For it to be a bug, it has to be doing something unintended. Unless you can find a document stating exactly what is intended you might find it very difficult convincing anyone else that it might be.