This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code portability

Hello,
I was browsing through older posts that deal with the painful issue of portability (http://www.keil.com/forum/docs/thread8109.asp). I was (and still am) a big advocate of programming as much as possible conforming to the C standard, and having a layered structure that allowed "plugging-in" other hardware. But I have come to change my mind recently. I am reading the "ARM system developer's guide" (excellent book by the way. I'm reading it because I want to port some C167 code to an ARM9 environment) in which chapter 5 discusses writing efficient C code for an ARM. The point is, and it is fairly demonstrated, that even common, innocent looking C code can either be efficient of very inefficient on an ARM depending on specific choices made, let alone another processor used! So, if we are talking about squeezing every clock cycle out of a microcontroller - I do not believe that portability without ultimately littering the code is possible!

Parents
  • If every last clock cycle counts then changing processor or increasing clock speed would seem a better option

    So efficient programming does not count in your school? Knowing your tool and hardware, as you so often preach, is the key!

    I do code with an eye to efficiency but try to avoid getting into situations where micro-optimisations are necessary. Knowing your tools and hardware are essential if you want to write reliable, maintainable and portable code. Knowing your tools and hardware are essential if you want to properly design and implement a system rather than papering over the cracks with software.

Reply
  • If every last clock cycle counts then changing processor or increasing clock speed would seem a better option

    So efficient programming does not count in your school? Knowing your tool and hardware, as you so often preach, is the key!

    I do code with an eye to efficiency but try to avoid getting into situations where micro-optimisations are necessary. Knowing your tools and hardware are essential if you want to write reliable, maintainable and portable code. Knowing your tools and hardware are essential if you want to properly design and implement a system rather than papering over the cracks with software.

Children
  • I understand you. But please also do refer to my comments above regarding divisions in the ARM core.

  • I was "lucky" enough to grow up with in an environment where no processors had multiply or divide instructions, or where the instructions did exist, but consumed about the same number of clock cycles as if you did the job with add and shift instructions.

    Because of this, I very often design my code based on order-of-two multiplications/divisions, where bit-and can be used instead of modulo and shifts can be used instead of multiplications/divisions. Most of the time, this can be done without significant loss of readability or maintability.

    More often than not, I see memory and not speed as the primary limiting factor. Most managers I have worked with are unable to say no to new features/requests/whishes/suggestions/dreams which results in products that will get new features one-by-one until the code memory is full. Only "hard" limits seem to get the managers to decide that a new feature can wait until the next generation product. To keep down the production and supply chain costs, most managers prefer "one-size-fits-all" Swiss army knife jack-of-all-trades products where all these features should always be available in the same application, just in case a customer calls and wonders if he/she can use his product for <insert strange request here>.

    The more features you put into a product, the more important it will be to focus on KISS, and make sure that the code is testable and maintainable. A tested and true hand-optimized assembler routine may be wonderful, but if the customer requires that the function should be slightly different, then that "tested" didn't mean anything anymore, and the "hand-optimized" part will bite you since it is based on assumptions that are no longer true.

    Hunting clock cycles (or bytes) is a good way to extend the life of an existing product but it is seldom a good idea for a new product, unless it is a 100k+ product with a trivial application that is only likely to be sent out in one (1) revision and then never changed.

    So, isn't division instructions important? Of course they are - but if I know that I am going to implement a high-speed signal-processing pipeline, I will probably spend quite a lot of time considering what processor to use, instead of just picking one and then start to fight with the clock cycles.

  • "I have no doubt what is the most 'processor pleasing C' and no optimization will change which is the most efficient."

    Don't bet too much. High-end processors may consume the % just as efficiently as the &, wnd with speculative multi-branch execution, the conditional clear can also be swallowed.

    The modula alternative is a more eneral optimization, but there are no "always best" constructs that will stay true for any compiler on any processor.

  • Don't bet too much. High-end processors may consume the % just as efficiently as the &, wnd with speculative multi-branch execution, the conditional clear can also be swallowed.

    The modula alternative is a more eneral optimization, but there are no "always best" constructs that will stay true for any compiler on any processor.

    Is this a forum for small embedded or for supercomputers???

    Erik

  • "Is this a forum for small embedded or for supercomputers???"

    Supercomputers? Sorry, but that remark could have been applicable ten years ago.

    Look around you. Embedded is more than a 230V programmable timer, or a bar code reader.

    Faster processors may exist in backbone switches, media centrals, gaming consoles, portable DVD players, signal analyzers etc.

    There are ordinary TVs out there with very powerful processors to allow direct display of photos, DVD movies, Divx etc from CF memory or USB-connected hard-drives.

    The old Pentium was the first x86 to be super-scalar, i.e. doing more than one instruction / clock cycle. But that was very many years ago, and quite a number of processor manufacturers have now processor cores with that technology. Have you looked at the newest embedded processors from Freescale for example?

    With the constantly shrinking geometries, the best embedded processors constantly moves closer to the better PC processors. They, together with Li-Po batteries - allows very powerful equipment to be battery operated for several hours.

  • "Is this a forum for small embedded or for supercomputers???"

    Supercomputers? Sorry, but that remark could have been applicable ten years ago.

    .. your statement about division.

    I know of no processor supported by keil tools where a divide is not more "cumbersome for the processor" than an and

    Erik

  • I'm a little confused. must a discussion about portable code be limited to the embedded processors that Keil supports?

    I don't have the time to scan through all ARM architectures to figure out what processors may match my criteria, but a quick look shows good potential at least.

    The Cortex-A8 core for example is superscalar. It isn't supported by the RealView MDK, but by the RealView Development Suite.

    And if you look at older cores, you'll notice that the LPC3180 has a vector FPU for example, even if the core isn't superscalar.

    So, both superscalar and vectorized operation has reached embedded processors in support by Keil. And with the speed embedded processors are broadening the range between the slowest and the fastest, it will not take too long until you will be able to walk around with a tiny battery-operated device with basically "supercomputer" performance. When I went to school, the Cray X-MP was the coolest of cool. With todays standards, the 400 megaflops of the X-MP seems quite modest.

    The C51 line will not die in a near future, but experiences from it can not be generalized to be applicable over the full embedded range. It is hard enough to make claims about algorithm efficiency when jumping between 8 and 64 bit processors (yes, Keil may not support any 64-processor yet, but that is irrelevant for this discussion).

    The important thing in this discussion is that with a correctly selected processor, the majority of the source lines will not be time-critical, and can be written in "normal" C (or C++), allowing simple porting to another processor. If a non-critical line of code doesn't run at optimum on the new processor normally isn't important, since it should normally not be time-critical on the new processor either.

  • Jack Sprat wrote: "Given that new posts are highlighted I don't see how that would offer any real benefit, but I'm willing to give it a try."

    The highlight only lives a short time. If you get a call after you have started to read updated threads, the highlights may disapear. And the highlights only shows unread posts, it doesn't show them in chronological order.

    Dan Henry wrote: "I am too late for the party, so as an aside, how does one 'post at the bottom'?"

    I normally press the 'Reply' button on the bottom-most post in the thread. It doesn't always work, since I can't see what other posts are in the process of being written.

    It is a bad thing that this forum doesn't indent discussion branches. Opening a heavily branched thread a day or two late makes it very, very confusing to try and understand who responded to what.

    Quite a number of times I have had to hunt time stamps just to locate new posts that has lost their highlighting, or what post was a reaction to what post. It really is a big pest.

    Keil: Either make this forum linear, or style the threads to show tree-view branches. Or at least number the posts, and add a little reference: "In response to post #x from yyyy-mm-dd hh:mm".

  • I'm a little confused. must a discussion about portable code be limited to the embedded processors that Keil supports?
    not necessarily; however two points:
    portability is a joke for the small processors and a valid concern for the biggies
    many participants here discuss the "Keil supported embedded processors" in this forum and other processors elsewhere, thus I would suggest that when a comment goes outside "small embedded" the comment should be so qualified.

    we can have a discussion that goes nowhere if someone makes points from the view of "small embedded" and the other from the view of Cray vs a 516 Pentiums in parallel.

    This being a Keil forum I would state that in the above the first do not need qualify, but the second should.

    Erik

  • I have naver made any claim based on any Cray machine - just noted that the high-end features of a 25 year old supercomputer are in existence in standard (not even the highest-end) embedded chips. And Keil do claim support for chips with said features.

    "many participants here discuss the 'Keil supported embedded processors' in this forum and other processors elsewhere, thus I would suggest that when a comment goes outside 'small embedded' the comment should be so qualified."

    So some architectures should not be discussed here even if Keil claims support? Now where is the limit? Nothing fancier than the ARM7TDMI core? What if it is a bigger/faster processor? Does Keil have any "big embedded" forum that should be used?

    Keil do claim support for at least dual-issue processors (Cortex-A8 core), and they do claim support for processors with vector instructions. Now, YOU make the definition that this is high-end PC machines or supercomputers or other and not related to this forum ("small embedded"). Please be more specific why you feel that the full range of Keil supported chips do not relate to this debate? And exactly where is the limit where I must specifically mention the exact chip in question?

    By the way - the world isn't standing still. If a Freescale or AMD or Intel or IBM chip has a feature that isn't available in any of the currently Keil-supported chips, it is quite likely that NXP, Texas, ST, ... will look into new processor generations with said feature.

    Look at the C51 chips. When originally released, they consumed huge number of clock cycles/instruction. If I remember correctly, they even named the individual clock steps since every instruction followed the same sequence even if some clock cycles where NOP cycles. With a bit of improvements to the pipeline, C51 chpis can now issue an instruction/clock cycle.

    If you build a surveilance camera today, you more or less expect it to create MPEG video, JPEG or MJPEG stills, perform motion detection and possibly count number of persons passing in the corridor.

    Everyone is constantly expecting more. A programmable timer is nice, but a timer controllable by Bluetooth, ZigBee or WLAN is nicer.

    Because of this constant technological race, any investment in embedded products requires the developers to look a bit into the future, and think about expansion potentials.

    Some hardware will completely disapear. Earlier, people built solutions with a PDA and maybe a data collector connected to a phone. Now the phone has built-in GPS, camera, bar code reader... so the previous hardware product could have been transformed into an application to install in the phone. Such a migration is very much a question of code portability. The application running on "low-end" dedicated hardware suddenly got moved into a generic platform.

    I very much think such moves do affect a number of readers of this forum.

    Product requirements has a tendancy to move at a quite high speed. "So, you measure the supply voltages 20 times / second. But since you have more AD channels available, can't the product be updated to log 2k measurements/second and store for no more than two hours? But the transfer cost to send in these samples are too high - can't it do fourier analysis locally in real time and just report important changes?"

    Or maybe: "But if you can multiplex a couple of LEDs, what does it take to drive this 1k LED panel? But we also have this 30k multi-color LED panel. By the way - is it possible to add an interface to connect TFT panels directly?"

    More than once, a product may spin off in a totally unexpected tangent. There is no way I can anticipate such moves, but I can anticipate that there will be moves, and try to make sure that the software investment is as valuable as possible by making sure that as many code blocks as possible can be reused in new projects and/or on new hardware.

    All this side track because you posted a bit of buggy pseudo-code with the assumption that bit-and must always win over modulo, or maybe your claim was that bit-and must always win over a compare and optional assign. An assumption that isn't true on PC-class hardware, and need not be true on middle-class embedded processors either.

  • By the way - the world isn't standing still. If a Freescale or AMD or Intel or IBM chip has a feature that isn't available in any of the currently Keil-supported chips, it is quite likely that NXP, Texas, ST, ... will look into new processor generations with said feature.

    are we discussing the present or the future?

    More than once, a product may spin off in a totally unexpected tangent. There is no way I can anticipate such moves, but I can anticipate that there will be moves, and try to make sure that the software investment is as valuable as possible by making sure that as many code blocks as possible can be reused in new projects and/or on new hardware.
    again if we are discussing "large" (embedded) the statement is valid, for small embedded it is a joke.

    All this side track because you posted a bit of buggy pseudo-code with the assumption that bit-and must always win over modulo
    I qualify my statements (which you seem to have a problem with) as in "no processor that Keil has tools for will divide as fast as and"

    re buggy pseudo-code who cares, it did show what I ment.

    Per, let this not turn into a 'war' just realize that there is a world of difference (today) between "small embedded" and large (embedded).

    just as an example, this thread started with 'squeezing every bit of performance out of it" which most often is valid in small embedded and rarely (we can argeue whether it should) is applied to large (embedded)

    Erik

  • again if we are discussing "large" (embedded) the statement is valid, for small embedded it is a joke.

    I work primarily with small embedded and I can assure you that portability is not a joke.

    re buggy pseudo-code who cares

    Says it all, really.

  • again if we are discussing "large" (embedded) the statement is valid, for small embedded it is a joke.

    I work primarily with small embedded and I can assure you that portability is not a joke.
    portability between WHAT? processors? compilers? houses?
    a lot of code is "automatically portable" e.g. a mathematical function and for small embedded anything beyond a "computing function" will take more effort to make portable that to port as non-portable.

    re buggy pseudo-code who cares
    Says it all, really.

    when quoting me, please quote fully

    "re buggy pseudo-code who cares, it did show what I ment.

    Erik

  • mista mikal,

    you is being proudly of tamer is you being yes?????

    you be started warrring of erac and jak agin!!!!!!

    you be hanging marow on his string for the mans to atack the fight yes????

  • Kalib,
    The only thing I see is a well-argumented, intelligent discussion, from which there is a lot to learn. War? I don't think so. Disagreement maybe, but that is the foundation for any progress!