This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

What does the C standard say about portability?

Hello,

See here:

www.open-std.org/.../C99RationaleV5.10.pdf

Is it possible that "Jack Sprat", the staunch defender of the C standard as the ultimate reference when writing programs, missed the following statement?

C code can be non-portable.  Although it strove to give programmers the opportunity to write
truly portable programs, the C89 Committee did not want to force programmers into writing
portably, to preclude the use of C as a “high-level assembler”:  the ability to write machine-
35  specific code is one of the strengths of C.  It is this principle which largely motivates drawing the
distinction between strictly conforming program and conforming program (§4).

this is precisely what Per Westermark has been saying.
Exactly what Erik Malund has been saying.
Remember: Jack Sprat claims often that writing a program that complies with the C standard is a GUARANTEE for its correct functioning.

Parents
  • "this is why we need a standard: it assures the operations / behaviors will be the same regardless of which hardware the code runs on."

    But this isn't part of the scope of the C standard, or of the scope of C.

    Actual behaviour of program must take into accounts lots of extras, such as (very incomplete list):
    - the size of the heap - will the program manage to allocate the required amount of memory on the target hardware when in expected target machine state? How will the specific implementation of the heap behave in regards to fragmentation?
    - allocation sizes - what is the largest continuous block that can be allocated from the heap?
    - array indexing - what is the largest memory block that can be indexed as an array? What is the largest index allowed?
    - execution speed - will the program manage to react fast enough to a signal? Will it be able to push enough number of characters/second trough a "standard out"? Will it manage to perform a computation and emit the answer before a new input arrives?
    - numeric range of data types. Not all processors have same word size. Not all processors even make the same decisions for small/int/large even if same native accumulator and/or register sizes. a machine don't need to have two-complement integers in 2^n-sized data types. There exists 6-bit and 9-bit characters and 36-bit ints. How to write an embedded program if the compiler defines the char type as 16 bit large, but the processor have 8-bit SFR side-by-side with no padding?
    - capability of stack. A perfectly written recursive expression evaluator may not work on all platforms just because some expressions results in too deep recursion.
    - self-modifyability. Function pointers are pointers. Some targets can do memcpy() with a function pointer to a new address and allow that new address to be used as a function call. So C code can duplicate a function into RAM before a code flash is erased. Some architectures can't run code from any R/W memory area. Some compilers/linkers have helper tools to link a function into code space for duplication into RAM space.
    - ability to handle or produce unaligned or packed data.

    The language standard does not assure same behaviour for all C programs or even for all valid C programs. It just tells the developer that within a given envelope, the program will behave with strict compatibility. Outside that envelope, the developer will either be totally on his own, or will have to rely on the compiler vendors notes about target-specific limitations/design choices.

    In embedded development, the target hardware often have a limited size. So the developer will have to write lots of comments and/or documents about made assumptions and about required tests if the code is moved to other hardware. Evaluations that must be tested with worst-case parameters to verify there is no overflow/underflow/division by zero. Time-critical code that must be verified with a profiler or an oscilloscope.

    Design-by-contract can be a nice development strategy. It's too bad that embedded targets often don't have the available code and/or RAM space for running debug builds with contracts validation code included.

Reply
  • "this is why we need a standard: it assures the operations / behaviors will be the same regardless of which hardware the code runs on."

    But this isn't part of the scope of the C standard, or of the scope of C.

    Actual behaviour of program must take into accounts lots of extras, such as (very incomplete list):
    - the size of the heap - will the program manage to allocate the required amount of memory on the target hardware when in expected target machine state? How will the specific implementation of the heap behave in regards to fragmentation?
    - allocation sizes - what is the largest continuous block that can be allocated from the heap?
    - array indexing - what is the largest memory block that can be indexed as an array? What is the largest index allowed?
    - execution speed - will the program manage to react fast enough to a signal? Will it be able to push enough number of characters/second trough a "standard out"? Will it manage to perform a computation and emit the answer before a new input arrives?
    - numeric range of data types. Not all processors have same word size. Not all processors even make the same decisions for small/int/large even if same native accumulator and/or register sizes. a machine don't need to have two-complement integers in 2^n-sized data types. There exists 6-bit and 9-bit characters and 36-bit ints. How to write an embedded program if the compiler defines the char type as 16 bit large, but the processor have 8-bit SFR side-by-side with no padding?
    - capability of stack. A perfectly written recursive expression evaluator may not work on all platforms just because some expressions results in too deep recursion.
    - self-modifyability. Function pointers are pointers. Some targets can do memcpy() with a function pointer to a new address and allow that new address to be used as a function call. So C code can duplicate a function into RAM before a code flash is erased. Some architectures can't run code from any R/W memory area. Some compilers/linkers have helper tools to link a function into code space for duplication into RAM space.
    - ability to handle or produce unaligned or packed data.

    The language standard does not assure same behaviour for all C programs or even for all valid C programs. It just tells the developer that within a given envelope, the program will behave with strict compatibility. Outside that envelope, the developer will either be totally on his own, or will have to rely on the compiler vendors notes about target-specific limitations/design choices.

    In embedded development, the target hardware often have a limited size. So the developer will have to write lots of comments and/or documents about made assumptions and about required tests if the code is moved to other hardware. Evaluations that must be tested with worst-case parameters to verify there is no overflow/underflow/division by zero. Time-critical code that must be verified with a profiler or an oscilloscope.

    Design-by-contract can be a nice development strategy. It's too bad that embedded targets often don't have the available code and/or RAM space for running debug builds with contracts validation code included.

Children
  • "In embedded development, the target hardware often have a limited size. So the developer will have to write lots of comments and/or documents about made assumptions and about required tests if the code is moved to other hardware."

    agreed.

    that's why writing portable code is hard and difficult to understand for many.

  • "In embedded development, the target hardware often have a limited size. So the developer will have to write lots of comments and/or documents about made assumptions and about required tests if the code is to be moved to other hardware."

    I inserted two words in the quote to make the point.
    what is stated is a thing that must be done at the conception of the code for the original processor.

    anyone ever been told at conception that the code would be moved?

    Erik

  • 'what is stated is a thing that must be done at the conception of the code for the original processor.'

    it doesn't have to be done at the conception; but it is best done at the conception.

    there are different kinds of portability:

    1) across platform portability: many tasks are non-hardware specific, like doing fft, for example (without using hardware). a pid library for example would be another good example here.

    2) across family portability: some tasks are hardware specific to a family of chips. their portability is likely limited to the interface - you always call i2c_send() to send a byte over i2c, but different platforms may have their own ways of performing that task. etc. those things that are specific to that family / hardware obviously isn't portable and have to be recreated on a new hardware. But portability insulates the higher layers from being (materially) rewritten.

    then there are pieces of code that are not going to be portable regardless of what you do. part of our job is to minimize that portion.

  • 'anyone ever been told at conception that the code would be moved?'

    many times over.

  • That documentation is needed always - it really doesn't matter if you plan for changing processor or not.

    It is quite common that a project starts with a processor running at 8MHz and a couple of years later is moved to a "identical" processor in the same family that runs at 16MHz while consuming half the power. The change was just that the newer processor costed less.

    Assumptions must always be documented as well as can be done, whatever the expectations about processor changes at a later time. It might just instead be an external chip that needs to be replaced because the original chip can't be bought. Or maybe there is a need to step up a baudrate on an interface because the transfer time represents extra cost at the factory or is needed because the device is intended to be used with another device that have started to support a higher baudrate. At low baudrates, it may be enough to round to closest divisor for the baudrate generator, so an assumption may have been made that no fractional baudrate compensation is needed. This obviously must also be documented in case there is a need to introduce new baudrates, since a smaller divisor value means larger granularity between each value.

    It's almost impossible to know what may happen to a product during the full life cycle. Only by having the developer thinking about the assumptions he makes directly when the code is written and originally validated and have him directly document these assumptions/validations, can you be reasonably sure that you have a reasonably well-documented source code.

    I'm reusing code today that I wrote 15 years ago or more. Besides being debugged from a "C" standpoint, I also know that there is a good documentation of the boundaries of the code.

    When looking at code on the net, comments are often missing. And if the code has comments, it's quite often just more or less a redundant description of what the code statements does. Sometimes with a description of each input parameter and what results a module produces. But almost never is code documented with assumptions and boundaries. What numeric range is safe (mathematically or explicitly tested) for input parameters? What resources does the code assume it may consume? What assumptions about data types? What assumptions about reentrancy?

    As your microcontrollers gets larger and larger, we can start solving bigger and bigger problems. For many designs, that means that a larger and larger percentage of the processor capacity is spent in business logic instead of doing I/O. Especially since new processors have better and better hardware acceleration for peripherials. This allows a larger percentage of products to separate the code into a hw layer and a business logic layer without wasting significant amount of time doing extra function calls.

    In the end, we get more and more embedded devices that has 10k, 100k or 1M source lines or maybe even more. The time invested in the source code gets larger and larger. And the code just have to be moved to newer platforms as technology improves, since the invested costs are so high because of the complexity of the problems being solved.

    A tiny lamp timer can have 100% of the code rewritten if you have a need to move to a different processor for improved cost efficiency or if the original processor is no longer available. Larger projects may live for 10 years or more, and having to move to new hardware every 6-12 months to be able to constantly increase the production volumes while at the same time dropping the production costs.

    This means that we embedded developers must constantly try to improve how we work, because the scope of our projects are getting larger and larger. At the same time, the cost of salaries are constantly increasing, making it harder to develop new products in a competitive way. Low cost countries have cheaper labour, but often developers without the know-how about a specific market niche. But if the development is outsourced, they will after a couple of years get the know-how, while the company ordering the development will lose their know-how (and soon their market).

    So - a long rant, but the end result is that we don't know about the future so we must document our assumptions and required validation tests (and do our best to see how source code can be layered), being prepared for changes if we want to stay in business.

  • agreed.

    we see very few 8-bit jobs now and we offer 8-bit capabilities primarily for marketing / one-stop purposes. the 8-bit market isn't important to us because of low demand and abundance of low-overseas' capabilities (hardware + software) so we aren't competitive there.

    most of our business comes from the 32-bit market. the overseas programmers may be cheaper but they have not mastered the right way of doing large / complex projects. language, as in communications / documentations, is another big hurdle that in my view renders the overseas shops non-competitive in this segment.

    "being prepared for changes if we want to stay in business."

    this is where structuring your code to be portable has a huge advantage. As the hardware changes, you are able to take out the relevant part and plug in the new hardware-specific portion of your code and you are ready to go with a project from day 1.

    that would be a huge competitive advantage and will help you offset any cost advantage your overseas competitors may have.


  • the overseas programmers may be cheaper but they have not mastered the right way of doing large / complex projects. language, as in communications / documentations, is another big hurdle that in my view renders the overseas shops non-competitive in this segment.

    While the local competitors are OOXX Level n qualified, the company I am currently working for, is a OOXX Level n+1 qualified (Supposed to be better). (I come from Taiwan.)

    People here are talking about modularizing, the modularizing is for code reusability and portability. Where the code reusability and portability are aiming at 8-bits/16-bits MCUs like PIC Microcontrollers and others, but 32-bits MCUs may also be included.

    To achieve the modularizing, someone proposed an idea/rule for C programming, where fileA.c are not allowed to include fileA.h, he claims this is for reducing the cohesion and data coupling. I have no choice but to try to stop him.

    I also encountered a lot of other amazing stuff here.

  • The first official case of modularizing is to produce a reusable and portable module for GPIO and key-input debouncing.

    For traditional PIC MCUs, they still believe that multi-layered structure of software design will work well. To me, multi-layered structure of software design leads to more depth of function calls and more RAM consumption.

  • Multi-layered, modularized, code can work very well. But it isn't always the best concept for a lamp timer or other projects with extremely little logic. It is a concept that assumes that the program have a bit of business logic - if it hasn't, then the size is probably so small that even badly modularized programs will be easy to maintain.

    Having GPIO in a module sounds like an incorrect slicing of the cake, but I might have misunderstood how it was planned. I normally create inline functions for getter/setter functions instead of trying to have a generic GPIO-read or GPIO-write.

    So a project may look like:

    activate_siren();
    activate_relay1();
    if (tamper_switch_active() {
        ...
    }
    

    The above makes it easy to change what pins that drives different things (and how the pins are driven) or how input stimuli is retrieved. Often, the same source code is compiled for multiple hardware platforms but the header file with all the inline functions is different.

    Since the inline functions may look like:

    __inline void activate_siren(void) {
        FIO1SET = 1u << P1_SIREN;
    }
    __inline void deactivate_siren(void) {
        FIO1CLR = 1u << P1_SIREN;
    }
    


    the efficiency is excellent.

    If trying to make the full GPIO into a generic module, that module must take parameters for the requested action, and decide what to actually do. It both takes extra code and extra clock cycles, without any gain. It is more likely to get the business logic intermixed with the decisions which actual pins that are used for different things.

    Code like: set_gpio_pin(GPIO_RELAY,GPIO_ON) requires the set_gpio_pin() function to figure out what port and pin to modify, and if the pin should be high and low (not trivial since some pins may require inverted logic depending on external electronics).

    And code like set_gpio0_pin(GPIO0_RELAY,GPIO_ON) will save the generic function from knowing what port is involved - but will instead require that the business logic is modified if the function is moved to a pin on a different port.

    Having C files that don't include a corresponding header file with the exported symbols sounds like a big mental accident. C++ can catch some problems thanks to the type safe linking. But for C, the problems will quickly be catastrophic unless a code analyzer with global processing capabilities is used.

    I see the header file as a form of "contract". It contains a list of services that the C module promises to deliver. Obviously, the module itself should also be allowed to know what services it promises to deliver.

    I'm not so sure about the suitability of having a generic keyboard debounce to plug into all projects. A big question is where the debounce code would get the information about time. Another is that some projects may have single buttons (ENTER, BACK, LEFT, RIGHT) connected to individual processor pins, while other projects may have a matrix keyboard where the user may hold more than one button pressed. Having fully generic code for a 4x4 keyboard would also be interesting since it then would basically have to to read and set pins one-by-one using the GPIO layer - the GPIO layer would have to be extremely advanced to support simultaneous sampling of controlling of multiple pins.

    In many situations, you perform modularization by calling a standard function name, to get something done, but then have multiple implementations depending on project. This is normal way to implement serial communication - each target processor have one source file for each supported serial port, and the program just uses com0_sendchar() or com0_printf().

    But trying to code something hardware-specific into a generic function either leads to lots of conditional compilation or into lots of really meaningless glue functions being created and called. How fun would it be with a generic UART driver that contains code for:

    f = get_base_frequency();
    idiv = get_uart_ticks_per_bit();
    divisor = f / idiv / baud;
    error = f - divisor * idiv*baud;  // might need fractional baudrate compensation
    set_baud_divisor(divisor,error);
    

    Too generic will quickly explode into unmaintainable, large and inefficient solutions.

  • "... reducing the cohesion and data coupling."

    Or:

    "... increasing the cohesion and data coupling."

    Either way, "someone" needs to get a clue about "proposing an idea/rule".

  • The C unfriendly architecture of the traditional PIC MCU makes things worse.

    Although I am not a competent developer, and I am not good at explaining and illustrating, but it is quite easy to know that, people here will fail.

    I am not able to do much to help people here. Because they are numerous, and in higher position.

    Maybe Bill Gates or Steve Jobs can convince them, but Dennis Ritchie and Ken Thompson can NOT.

  • "multi-layered structure of software design leads to more depth of function calls and more RAM consumption"

    Yes: like use of the programming langugage, use of a "multi-layered structure" can be done well, or done badly - it is not, of itself, a magic bullet.

    Yes: adding an "abstraction" layer to help make code portable does usually add some overhead. As always, there is a tradeoff of the gain in terms of developer/maintainer performance against any overheads on the target.

    The criteria for "optimisation" need to include not only the "costs" of code size, RAM size, and execution speed - but also the developer/maintainer costs...

    It is not uncommon that different parts of the code will need different balances in this tradeoff...

  • Writing modular code is something that is done to help reduce the complexity and/or improving code reuse.

    That obviously requires a very light hand. Prio 1 is to analyze the situation. Then a suitable solutino can be suggested.

    That means that a company can't just produce a document telling how people should modularize a program. The process must include the project team based on the needs of the project and the available resources.

    Having iron-hard rules doesn't help the project team - except that they are allowed to turn off their brains and hack lousy "by-the-book" code without having to care about how well it works. The goal should be to write economical and well-working code, not a 25-layer precursor to Eddie, the shipboard computer from The Hitchhikers Guide to the Galaxy. It just may end up more like HAL from 2001.

  • Is that the latest Arduino spin-off...?!

    ;-)

  • "multi-layered structure of software design leads to more depth of function calls and more RAM consumption."

    absolutely true.

    that's why we live, unfortunately, in a world where people are paid big $$$$$ to make the right compromise.

    engineering a non-compromised design is simple - because you will never get it done.

    it is engineering a compromised design that is incredibly hard.