We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
unsigned char buf[100] . . . unsigned int val; . . . val = *((unsigned int *)(&buf[1])); . . .
comments?
been there, done that, and, even worse
Some compilers for 'multibyte word' processors that require alignment will not declare a fault but access the previous byte and the pointed byte instead of the pointed byte and the next. (I.e. a 16 bit processor that ignore LSB of the address for word fetch). When I complained about it to one compiler maker, I was informed "there is nothing in ANSI C about this". I never checked, what good does it do to state "there is something in ANSI C about this", if your program does not work because of such.
Then, of course, there are the endianness which, if the array is accessed both as char and int, will make a real pow-wow.
Erik (sorry abuot "multibyte word" what else would you call it)
Misalignment detection not really up to the compiler. Some hardware -- the ARM7/9 core for example -- doesn't even detect alignment faults, and will just silently do interesting things to your byte lanes.
There are some cases where compilers might be able to suspect or even prove alignnment problems, but I'm not sure that's possible in general, especially since actual positioning of the data is theoretically the job of the linker. Without inserting run-time checks on the actual value of a pointer just before it is used, which would have a huge effect on the code, I don't think a compiler can solve that problem in general. (Might make for an interesting debug option, just like an optional null pointer check that no compiler vendor offers as a debug option, either.)
sorry about "multibyte word"
Sounds good to me.
When I complained about it to one compiler maker, I was informed "there is nothing in ANSI C about this".
That information is correct. Code like the OP is as wrong as C code can possibly be, while still apparently working sometimes: it causes undefined behaviour. Literally anything can happen, because the language makes no promises whatsoever what such a program may do. "Anything" of course includes "what the coder expected", which makes this kind of error so nasty --- it'll just work for quite a while, but unexpectedly break when you use the same code on a slightly different platform.
I never checked, what good does it do to state "there is something in ANSI C about this", if your program does not work because of such.
The "because of such" part is incorrect. The program fails for a much simpler reason: it's wrong. The code makes assumptions about misaligned access via a maltreated pointer, that aren't backed up by any applicable rule. The code will only work if those assumptions happen to be true.
what I ment by "because of such" was that if the tool (is supposed to) behave in such a way under such circumstances, well, the thing to do is to avoid "such circumstances", there is no other way to get the product out.
BTW the 'error' was not mine, it occurred when I was the contact person to the compiler manufacturer while we were using a beta of a 16 bit compiler and one of my coworkers (one of those 'coders' that make it a point of pride to be ignorant of the hardware) asked "why does it not work". I realized rather quickly what was going on and, in the hope that a compiler 'catch' existed reported the problem. I would claim that ANY multiword byte compiler should, at least, warn when a memory location is "typecasted up"
Erik
Thanks for your comments.
My view is that it is obvious non-portable code; but since the coder wrote it explicitly, it is intentional and is not wrong.
It may rely on assumptions concerning compiler and platform, but if those assumptions are constant for said compiler and platform then the assumptions are relatively safe.
When I get out of bed in the morning, I do not put on my wellington boots. I assume that there has been no flood while I've been asleep. It's not wrong to assume that, but it's a fairly safe bet.
If I were to go to (say) the Pacific island of Tivalu, I may well change the assumption; just like I would if I were to change compiler and/or platform.
"since the coder wrote it explicitly, it is intentional and is not wrong."
No, that does not follow at all! You may happen to be right in this case, but you can't generally make that assumption!
"It may rely on assumptions concerning compiler and platform, but if those assumptions are constant for said compiler and platform then the assumptions are relatively safe."
Unfortunately, it may rely upon false assumptions - and it might only work by pure luck!
Also, there is no guarantee that the assumptions will not become false with a compiler update - or possibly even if some options are changed...
That's why all such assumptions should be clearly and fully documented!
And, if the coder didn't bother to provide such documentation, you at least have to wonder if that's because she/he didn't understand the issues...
No, a program using implementation-specific behaviours in C is not considered invalid. Just less portable or in some cases buggy. It's all a question about what assumptions the developer made, and if these assumptions are valid for the intended platform(s).
C did intentionally allow a number of specifics to be left to the architecture or language implementation. If it would have been invalid, the language standard would instead have specified that the compiler should (or must) flag the code as invalid (an error) if it is able to detect the problem.
If I know that the processor is little-endian, the size of an integer, and that the data is aligned or that the target architecture can work with unaligned data, then I am completely allowed to typecast that position of the received byte array into an int pointer, allowing me to read the value in a single instruction, instead of reading it as two (or more) bytes, and shifting the individual bytes into the correct position.
It isn't wrong to do it. It ist just a question of calculated risk contra potential gain. Yes, people can get their foot shot off, but people who don't think about the possibility of division by zero or stack size or numeric overflow can also get their feet shot off.
The compiler isn't allowed to issue an error (unless explicitly allowed by the user) for non-portable usage, because the compiler is required to produce a runnable program. It is then up to the specific hardware if the application will generate an exception or strange results.¨
To give an explicit example: 6.1.3.4 in the language standard notes that it is undefined behaviour to make a conversion from an integer data typ to a different data type that can't handle the full numeric range. Assigning an int to a char for example. Whe may now and then see warnings "Conversion might loose significant digits" etc, but the programs are still valid. Some long definitions are twice the size of int. Some are not. Even if I add a typecast intvar = (int)logvar, it still represents a conversion from long to int. If this represents an invalid program, then most applications larger than "hello world" (and quite a number of them too) would be invalid.
Most source lines we write are based on assumptions. For example the assumption that other code somewhere have verified the input range of all parameters read - if not, every single + or * could overflow. And since the behaviour then can be undefined (note that not all machines are two-complement) then every + or * would require code to establish that they can not fail. But even that code would probably contain code that - depending on situation - may need to make assumptions. How about machines that can have +0 and -0? They exist, and we just have to make assumptions - some math function code is allowed to treat +0 and -0 differently depending on architecture. How many have specifically tried to tell the compiler that your intention is +0?
A different example: How many significant characters do you use in externally linked symbols? Different linkers supports different length of external symbols. Should all programs that doesn't have significant symbol names within the first 6 characters be invalid?
Anyone using memset() to clear a large struct or array? But what is the internal representation of 0 by the harddware?
Is a program invalid if it writes a 100 character text string without any newline characters in it? It is considered undefined behaviour to write past the end of the terminal width - but since there exists handhelds with puny displays, the assumption would then be that anything that writes more than a single character before a newline may trigger 5.2.2.
That comes as no surprise.
When I complained about it to one compiler maker, I was informed "there is nothing in ANSI C about this". I never checked,
Neither does that.
what good does it do to state "there is something in ANSI C about this", if your program does not work because of such.
Well, you see, the standard defines the 'C' language. If you write code that makes assumptions that are not guaranteed by the standard then you cannot reasonably expect your program to work.
I know this will fall on deaf ears, but I'll say it again anyway: if you want to become proficient with a tool you really do need to read and understand the manual.
"It may rely on assumptions"
some of the longest debugging sessions ... have been a result of relying on assumptions.
Mr smoked sardine,
the references to ANSI C had absolutely nothing to do with the point
Let me try to translate my statement to something a smoked sardine can understand: "it makes no sense to discontinue development because of a bug (perceived or real makes no difference) in the tools if there is a workaround"
Word alignment is NOT a C issue but an architecture issue.
Yes, but life isn't expected to be simple. Anything non-trivial has to be based on a number of assumptions.
We can't avoid assumptions. Just try to make good ones, and to qualify them. We can made risk assesments for a project - what if our assumptions about used hardware, used tools, available time, stability of customer requirements etc are wrong. We can document our code, specifying what assumptions we have made (or rat least realizes that we have made). We can - if the hhardware permits - perform checked builds, that contains extra integrity-testing code. We can make use of the preprocessor. We can use regression testing...
While our job is to produce working - and economical - solutions, we can't ignore assumptions. If we think that there are no assumptions involved, then we have just made a very big, and very wrong assumption.
In short: it is almost impossible to write any non-trivial applications that are guaranteed to work on any existing platform that has an ANSI/ISO-conformant compiler.
It is nice that you see that I have a lot of experience.
If, on the other hand, you were trying to belittle me, you must have missed "BTW the 'error' was not mine, it occurred ... in my post.
Er5ik
True - But someone who builds up experience of such things can learn to more reliably determine the risk.
Just to follow on from my previous - Through experience I have determined that I don't have to put on my wellington boots before getting out of bed.
I would prefer to consider it a calculated risk.
I would not consider it wrong - I might, possibly, change my mind if I were to get my feet wet one morning ;)
True - But someone who builds up experience of such things can learn to more reliably determine the risk. there is nothing wrong with experience, if therte was, I would be up the creek re the '51 :)
Of course, were I to 'verify' my assumption that a char is 8 bits every time I type char, I would never get anywhere.
I can state my point in another way, which may be better: "when you see a bug, before anything else, verify the correctness of your assumptions"
The references to ANSI C had absolutely everything to do with my point, however.
No, that isn't a translation of what you said. It may be what you wish you had said, but it certainly doesn't reflect what you did say.
Word alignment is NOT a C issue
So, why did you complain to the compiler vendor about your problem?
Sadly you do seem to have a lot of experience of bugs which you should never have introduced into the code in the first place. Your assumption when something doesn't work as you expect is that the problem lies with the tools - this is quite typical of those who (as you admit openly, in fact you seem proud of it) haven't read the appropriate documentation.
Note that in this case the 'tool' is the 'C' language, and the 'appropriate documentation' is the definition of the language.
you must have missed "BTW the 'error' was not mine, it occurred ... in my post.
Yes, I noticed you try to salvage some credibility in a followup post.