We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
unsigned char buf[100] . . . unsigned int val; . . . val = *((unsigned int *)(&buf[1])); . . .
comments?
No, a program using implementation-specific behaviours in C is not considered invalid. Just less portable or in some cases buggy. It's all a question about what assumptions the developer made, and if these assumptions are valid for the intended platform(s).
C did intentionally allow a number of specifics to be left to the architecture or language implementation. If it would have been invalid, the language standard would instead have specified that the compiler should (or must) flag the code as invalid (an error) if it is able to detect the problem.
If I know that the processor is little-endian, the size of an integer, and that the data is aligned or that the target architecture can work with unaligned data, then I am completely allowed to typecast that position of the received byte array into an int pointer, allowing me to read the value in a single instruction, instead of reading it as two (or more) bytes, and shifting the individual bytes into the correct position.
It isn't wrong to do it. It ist just a question of calculated risk contra potential gain. Yes, people can get their foot shot off, but people who don't think about the possibility of division by zero or stack size or numeric overflow can also get their feet shot off.
The compiler isn't allowed to issue an error (unless explicitly allowed by the user) for non-portable usage, because the compiler is required to produce a runnable program. It is then up to the specific hardware if the application will generate an exception or strange results.¨
To give an explicit example: 6.1.3.4 in the language standard notes that it is undefined behaviour to make a conversion from an integer data typ to a different data type that can't handle the full numeric range. Assigning an int to a char for example. Whe may now and then see warnings "Conversion might loose significant digits" etc, but the programs are still valid. Some long definitions are twice the size of int. Some are not. Even if I add a typecast intvar = (int)logvar, it still represents a conversion from long to int. If this represents an invalid program, then most applications larger than "hello world" (and quite a number of them too) would be invalid.
Most source lines we write are based on assumptions. For example the assumption that other code somewhere have verified the input range of all parameters read - if not, every single + or * could overflow. And since the behaviour then can be undefined (note that not all machines are two-complement) then every + or * would require code to establish that they can not fail. But even that code would probably contain code that - depending on situation - may need to make assumptions. How about machines that can have +0 and -0? They exist, and we just have to make assumptions - some math function code is allowed to treat +0 and -0 differently depending on architecture. How many have specifically tried to tell the compiler that your intention is +0?
A different example: How many significant characters do you use in externally linked symbols? Different linkers supports different length of external symbols. Should all programs that doesn't have significant symbol names within the first 6 characters be invalid?
Anyone using memset() to clear a large struct or array? But what is the internal representation of 0 by the harddware?
Is a program invalid if it writes a 100 character text string without any newline characters in it? It is considered undefined behaviour to write past the end of the terminal width - but since there exists handhelds with puny displays, the assumption would then be that anything that writes more than a single character before a newline may trigger 5.2.2.
No, a program using implementation-specific behaviours in C is not considered invalid.
That's quite beside the point --- the code we're talking about is considerably worse than that. It doesn't just rely on implementation-defined behaviour, but rather causes undefined behaviour.
To give an explicit example: 6.1.3.4 in the language standard notes that it is undefined behaviour to make a conversion from an integer data typ to a different data type that can't handle the full numeric range. Assigning an int to a char for example.
That example is seriously flawed --- conversion from int to char is not covered by 6.1.3.4. Nor does it cause undefined behaviour. It's covered by 6.1.3.3, and the behaviour is at worst implementation-defined.