Hello,
We came across a mystery/bug yesterday. This was tested with C51 v9.53 (Simplicity Studio) and 9.56 (uVision).
unsigned char data c; unsigned int data d; // The values 0x7B and 0xFF7B are wrong c = (200 - 50) * (255 - 0) / (255 - 50); 0000738d: MOV 65H, #7BH d = (200 - 50) * (255 - 0) / (255 - 50); 00007390: MOV 66H, #0FFH 00007393: MOV 67H, #7BH //These are correct c = (200 - 50) * (255u - 0) / (255 - 50); 0000738d: MOV 65H, #0BAH d = (200 - 50) * (255u - 0) / (255 - 50); 00007390: MOV 66H, #00H 00007393: MOV 67H, #0BAH
The uVision docs say that numeric constants default to 16 bits. Is this not true? Or is this an issue with the "Constant Folding" optimizing step?
Any insights appreciated, Darren
unsigned char data c; unsigned int data d;
// The values 0x7B and 0xFF7B are wrong
Not exactly.
c = (200 - 50) * (255 - 0) / (255 - 50);
All the constants are "upgraded" to 16 bit values, but there is nothing indicating to upgrade past integers (signed) to unsigned integers. Some of these "positive" numbers become "negative" numbers during the calculation, thus giving what appears to be invalid values. In the version you say is correct, you instructed the compiler to upgrade to unsigned 16 bit integers and got what you were expecting (but not what you had originally asked for)
(200 - 50) is 150 (255 - 0) is 255 150 * 255 is 38,250. This is a "negative" number. (it is 0x956A or -27286)
For division on this processor, If you want to divide with/by a negative number, we take the 2's compliment of the number(s) and make any signed adjustments when we are done.
0x956A is converted to 0x6A95 or +27286
26286 / 205 is 133.1024.
Since we divided a negative number by a positive number, we know we need to take the 2's complement of the result to get the "correct" answer.
133.1024 is rounded to 133 and becomes -133 -133 as a 16-bit integer is 0xFF7B
Thanks Robert for this great explanation. I suppose I was expecting that, at compile time, it would catch the signed size issue and upgrade to a 32bit evaluation. I say this because, if you do this
d = 38250 / 205; 0000738b: MOV 66H, #00H 0000738e: MOV 67H, #0BAH
it does what I expected. But I didn't ask it explicitly to use unsigned values. So is the compiler able to see that 38250 is too big for signed 16bit and so then upgrades to 32bit?
That would explain why this:
ADC0CF = (((SYSCLK/2)/3000000)-1)<<3;
works, but my original example didn't.
I just want to make sure I understand its rules so I don't get tripped up again.
Darren
I suppose I was expecting that, at compile time, it would catch the signed size issue and upgrade to a 32bit evaluation.
That would be an illegal optimization. The pre-computed result at compile time is required to be the same as if the operation had been performed on the target CPU; that's known as the "as-if rule".
So is the compiler able to see that 38250 is too big for signed 16bit and so then upgrades to 32bit? Not just this compiler is able to see that: all C compilers are required to see that by the language's definition, the ISO C Standard.
I just want to make sure I understand its rules so I don't get tripped up again. So look up the rules in your C textbook (or the Standard itself).
d = 38250 / 205;
In the above example, the constant 38250 was upgraded to unsigned int. My guess is unsigned int is 16-bits on your machine. The compiler decides the size of the calculations based on the individual constants being used. It is happy to upgrade to size unsigned int, but will not go higher than that unless something is specifically a "higher" type so no, it was not upgraded to a 32-bit value.
Above my guess is that SYSCLK is defined to be larger than unsigned int, so the compiler is also happy to upgrade the constant 3000000 to the size of SYSCLK. If SYSCLK was of type unsigned int the compiler would have probably complained. (Try replacing SYSCLK with 38250 and see if it complains about 3000000 being too large / truncation happening)
d = (200 - 50) * (255 - 0) / (255 - 50);
All of the constants fit very nicely in a 16-bit signed integer so the compiler does not have a valid reason to "upgrade" to a 16-bit unsigned integer.
d = (200 - 50) * (255u - 0) / (255 - 50);
In this case you specify to the compiler a type "higher" than integer (unsigned integer) so all are converted to this.
d = (200 - 50) * (255u - 0) / (255 - 50 - (-1));
In this case, the compiler would expect to upgrade all constants to unsigned int, but the -1 will cause a "problem" as it does not fit within the range of an unsigned int.
In the above example, the constant 38250 was upgraded to unsigned int.
No. Or rather: if it had been, that would have constituted a compiler bug. For a 16-bit target, the constant 38250 is of type (signed) long int. Which is at least 32 bit wide, but could be wider.
My guess is unsigned int is 16-bits on your machine.
No need for a guess: it's C51, so yes, int is 16 bits wide.
No. Only one of the constants in that expression will be converted to unsigned: the 0 in (255u - 0). The other 4 stay as signed ints. Then the results of their subtractions get converted to unsigned. I.e. what actually happens is equivalent to:
((unsigned)(200 - 50) * (255u - (unsigned)0)) / (unsigned)(255 - 50)
but the -1 will cause a "problem" as it does not fit within the range of an unsigned int.
No it won't, because that -1 is never promoted to unsigned.
For OP -
You were correct that 3000000 would be considered a long int (32-bit signed value). My guess on the SYSCLK would make no difference to this. Any comments based on SYSCLK "helping" to upconvert the 3000000 should be ignored. Hans-Bernhard Broekers specific example of when the conversions to unsigned int happen are correct.
//working - all 16-bit calculations, signed on 2 of the inner calculations, unsigned on the other inner calculation, unsigned on all 3 of the "outer" most calculations. ((unsigned)(200 - 50) * (255u - (unsigned)0)) / (unsigned)(255 - 50) //original - all 16-bit signed calculations (unsigned) ((200 - 50) * (255 - 0)) / (255 - 50))
Where a signed constant such as -1 will always be signed, calculations that use this constant can do conversions on it.
unsigned x = 255u * -1; // 65281 - but you already know this can happen. unsigned x = 255u * ((unsigned) -1);
So look up the rules in your C textbook (or the Standard itself).
I pulled out my 1988 K&R.
page 198, section A2.5.1 ------------------------------------------ "An integer constant may be suffixed by the letter u or U, to specify that it is unsigned. It may also be suffixed by the letter l or L to specify that it is long.
The type of an integer constant depends on its form, value, and suffix. If it is unsuffixed and decimal, it has the first of these types which its value can be represented: int, long int, unsigned long int.
If it is unsuffixed octal or hexadecimal, it has the first possible of these types: int, unsigned int, long int, unsigned long int.
If it is suffixed by u or U, then unsigned int, unsigned long int.
If it is suffixed by l or L, then long int, unsigned long int." ------------------------------------------
Does that contradict anybody else's observations/understanding?
Make that page 193, and there's a typo in this sentence, it should read:
"If it is unsuffixed and decimal, it has the first of these types in which its value can be represented: int, long int, unsigned long int."
That's a little more readable.
for the '51 an int is 16 bits. 32 bit int would be ridiculous for the '51
Yes, it does. K&R is not really used as a spec by compilers anymore.
Hexadecimal constants are always unsigned (without suffix). So it is unsigned int then unsigned long int.
Decimal constants are always sign integer (without suffix). They will not be upgraded to unsigned constants.
Ridiculous or not, a long int / unsigned long int needs to be at least 32-bits on an 8051 and decimal constants will be upgraded to long int if need be.
Hans-Bernhard Broker correctly pointed this out earlier, referencing a more recent spec than K&R.
(200 - 50) * (255 - 0) / (255 - 50)
These are all int constants and all calculations will to be performed as int, hence the original unexpected, but correct result from the compiler.
38250 / (255 - 50)
This would produce the OP's expected value. The 38250 is a long int constant, 255 and 50 are int constants.
Notice here that the constant 38250 and the value of the compile time constants (200 - 50) * (255 - 0) are not the same. 38250 is an long int (32-bits) and the result fits. (200 - 50) * (255 - 0) is an int (16-bits) and the result does not fit. The compiler can easily know this, but as Hans-Bernhard Broker pointed out, the compiler has 4 int values and is not allowed to do anything but int calculations with these numbers. So (38250 != (25 - 50) * (255 - 50))
As the OP stated, this calculation is proper, and it should be because it will be done at the 32-bit level - it may be signed or unsigned calculation based on type of SYSCLK, but it is likely a decimal constant which will be signed long int just as 3000000 is signed long int.
(200 - 50) * (255u - 0) / (255 - 50)
Here we have 5 int constants and 1 unsigned int constants. Even though the 0 is an int constant, when the calculation is done between an int and an unsigned int, the int is converted to an unsigned int for the calculation and the calculation is done at that level, producing an unsigned int result. 39250 is an long int constant so any calculations using this will be done at least the long int level.
K&R is not really used as a spec by compilers anymore. While correct, that is irrelevant for the case at hand, because this aspect hasn't changed since K&R 2nd edition, 1988, which still accurately reflects the current international standard in these aspects.
Hexadecimal constants are always unsigned (without suffix). No, they're not. It used to be like that back when K&R 1st edition was the only de-facto standard, but it has been wrong for 27 years now. The type of 0xff is signed int. Yes, that may be surprising.
Decimal constants are always sign integer (without suffix). They will not be upgraded to unsigned constants. Depending on what kind of "being upgraded" you refer to here, that's wrong in one of two ways.
Decimal, unsuffixed constants have signed type, correct. But there is no "upgrading" of constants involved in getting that type. They get a type big enough for their value, and that's that.
But constants, like any other integer operand, will be upgraded (the correct term is: "implicitly converted") to unsigned if they appear in expression contexts like the ones described in this threads.
Hans-Bernhard Broker correctly pointed this out earlier, referencing a more recent spec than K&R. That statement is misleading, because this discussion hasn't touched any difference between K&R1 and ANSI/ISO standard C except your mention of the type of hex constants.