We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hello, I was browsing through older posts that deal with the painful issue of portability (http://www.keil.com/forum/docs/thread8109.asp). I was (and still am) a big advocate of programming as much as possible conforming to the C standard, and having a layered structure that allowed "plugging-in" other hardware. But I have come to change my mind recently. I am reading the "ARM system developer's guide" (excellent book by the way. I'm reading it because I want to port some C167 code to an ARM9 environment) in which chapter 5 discusses writing efficient C code for an ARM. The point is, and it is fairly demonstrated, that even common, innocent looking C code can either be efficient of very inefficient on an ARM depending on specific choices made, let alone another processor used! So, if we are talking about squeezing every clock cycle out of a microcontroller - I do not believe that portability without ultimately littering the code is possible!
Efficient programming is important. But when "every single cycle counts" you run a risk that the development cost will explode.
You may also fail to get the product to market early.
And you do not have any safety margin in case you got a late "gotcha", where the requirements suddenly got updated, or you found a little processor errata or a compliance test showed a failure to fulfull some regulations.
There is an old saying that the last 5% of the application can take 95% of the development time. The difference between writing a couple of critical loops in optimized assembler and of having to consider almost all of the application as timing critical represents a huge cost difference.
Another thing is the cost to develop the next generation product. With a bit of safety margin, you may be able to use the same hardware platform. On the other hand - with most of the software written for maintainability instead of maximum optimization, you may do quite drastic changes to the hardware and still be able to reuse a large amount of code.
The ability to select the optimum chip is very important to the final price of the product. Too much optimized code may mean that a significant amount of code has to be thrown away and recreated.
We have one product where a processor with 8kB flash and 1kB RAM is used and everything is written in assembler. We have another product with 256kB flash and 32kB RAM, and just about everything is written in C. The cost of the two processors are almost the same. The big difference: The larger processor was selected 24 months later and could not use a single line of code from the other product. A number of lines of code from the newer product has migrated to even more products, since the C code is more protable.
The cost of a product is not directly related to the clock frequency or the number of kB of flash/RAM, so portability, maintainability and developemtn time must be taken into consideration.
One nice thing with a not too heavilly optimized C program is that a project may start with two alternative processors. A brand new, very inexpensive chip with a high risk factor because of the possibility of delivery delays. And an older chip with a higher cost but similar functionality. The project can then strive for the new and dirt-cheap chip, but still have a backup plan where most of the code will be usable on the older chip in case that is the only way to get a product on the market within the required time frame.
If every last clock cycle counts then changing processor or increasing clock speed would seem a better option.
"then changing processor or increasing clock speed would seem a costlier option"
Not necessarily. Prices do change with time and popularity of components.
If every last clock cycle counts then changing processor or increasing clock speed would seem a better option. If neither of these is feasible then accept that the design is inadequate
indequate??? if it is doable, working and there are cost constrains, where do you get 'inadequate' from
Well, the system clearly hasn't been designed so that it can be programmed in 'C', but you're trying to program it in 'C'. I'd describe that as an inadequate design. Perhaps you'd prefer something like 'not up to the job'? Also, code that is written in a 'do-able' situation is likely to be an unmaintainable, non-portable mess. This in turn means increased development and maintenance costs, increased chance of unnoticed or 'marginal' bugs, and of course longer time to market.
If every last clock cycle counts then changing processor or increasing clock speed would seem a better option. If neither of these is feasible then accept that the design is inadequate, code the necessary bits in assembly
A reasonable approach, but if "processor pleasing C" will do, then why not use that?
Because you can't guarantee that a 'C' construct will compile to the same object code each time. Any change in compiler version or optimisation level may affect things, and as you will no doubt be using the highest optimisation level given that 'every clock cycle counts' any change to an unrelated area of code may alter timing in the critical parts.
then start work on the MK2 replacement system as quickly as possible.
and go bankrupt becuse your product is more expensive than the competitors.
By your logic you will have no competitors, as your product will be cheapest therefore they will be bankrupt.
If the world rotated about "standard C" and "programmers convenience" instead of business realities then ...
Business realities dictate that one should think about the total cost over the product lifecycle. This includes such things as future development, time to market, maintenance and upgrade of existing systems and so on. If you've thrown together some nightmarish hodge podge of hand 'optimised' code on some hardware running at the bleeding edge you've had it.
Please go to the nearest paharmacy and buy a dose of reality
Now I understand where you're going wrong.
If every last clock cycle counts then changing processor or increasing clock speed would seem a better option
So efficient programming does not count in your school? Knowing your tool and hardware, as you so often preach, is the key!
I do code with an eye to efficiency but try to avoid getting into situations where micro-optimisations are necessary. Knowing your tools and hardware are essential if you want to write reliable, maintainable and portable code. Knowing your tools and hardware are essential if you want to properly design and implement a system rather than papering over the cracks with software.
Jack: This forum is not threaded, even if it for some reason allows people to place answers in the middle.
Please avoid that attempt and instead post at the bottom - since you use a lot of quotes, it really doesn't matter if there will be a number of posts in between your post and the one you are responding to.
Per, Thanks for your insight. By the way, I have another one for Jack:
Did you know that am ARM does not have divide instructions in hardware ? If you try to port code that heavily relies on divisions into an ARM (without modifications, such as converting divides into multiples), you are destined to be forced to use many calls to the compiler's C library. And that is going to hurt, not?
I understand you. But please also do refer to my comments above regarding divisions in the ARM core.
I was "lucky" enough to grow up with in an environment where no processors had multiply or divide instructions, or where the instructions did exist, but consumed about the same number of clock cycles as if you did the job with add and shift instructions.
Because of this, I very often design my code based on order-of-two multiplications/divisions, where bit-and can be used instead of modulo and shifts can be used instead of multiplications/divisions. Most of the time, this can be done without significant loss of readability or maintability.
More often than not, I see memory and not speed as the primary limiting factor. Most managers I have worked with are unable to say no to new features/requests/whishes/suggestions/dreams which results in products that will get new features one-by-one until the code memory is full. Only "hard" limits seem to get the managers to decide that a new feature can wait until the next generation product. To keep down the production and supply chain costs, most managers prefer "one-size-fits-all" Swiss army knife jack-of-all-trades products where all these features should always be available in the same application, just in case a customer calls and wonders if he/she can use his product for <insert strange request here>.
The more features you put into a product, the more important it will be to focus on KISS, and make sure that the code is testable and maintainable. A tested and true hand-optimized assembler routine may be wonderful, but if the customer requires that the function should be slightly different, then that "tested" didn't mean anything anymore, and the "hand-optimized" part will bite you since it is based on assumptions that are no longer true.
Hunting clock cycles (or bytes) is a good way to extend the life of an existing product but it is seldom a good idea for a new product, unless it is a 100k+ product with a trivial application that is only likely to be sent out in one (1) revision and then never changed.
So, isn't division instructions important? Of course they are - but if I know that I am going to implement a high-speed signal-processing pipeline, I will probably spend quite a lot of time considering what processor to use, instead of just picking one and then start to fight with the clock cycles.
Because you can't guarantee that a 'C' construct will compile to the same object code each time.
OH???? I would like you to justify yor statement re the following pseudo code:
ring_buffer[30]
ring index++ if ringindex%30 ring index = 0
as ooposed to ring buffer[32] ring index++ ring index &= 0xc0
I have no doubt what is the most "processor pleasing C" and no optimization will change which is the most efficient.
Erik
"I have no doubt what is the most 'processor pleasing C' and no optimization will change which is the most efficient."
Don't bet too much. High-end processors may consume the % just as efficiently as the &, wnd with speculative multi-branch execution, the conditional clear can also be swallowed.
The modula alternative is a more eneral optimization, but there are no "always best" constructs that will stay true for any compiler on any processor.
Is this a forum for small embedded or for supercomputers???
"Is this a forum for small embedded or for supercomputers???"
Supercomputers? Sorry, but that remark could have been applicable ten years ago.
Look around you. Embedded is more than a 230V programmable timer, or a bar code reader.
Faster processors may exist in backbone switches, media centrals, gaming consoles, portable DVD players, signal analyzers etc.
There are ordinary TVs out there with very powerful processors to allow direct display of photos, DVD movies, Divx etc from CF memory or USB-connected hard-drives.
The old Pentium was the first x86 to be super-scalar, i.e. doing more than one instruction / clock cycle. But that was very many years ago, and quite a number of processor manufacturers have now processor cores with that technology. Have you looked at the newest embedded processors from Freescale for example?
With the constantly shrinking geometries, the best embedded processors constantly moves closer to the better PC processors. They, together with Li-Po batteries - allows very powerful equipment to be battery operated for several hours.
Did you know that am ARM does not have divide instructions in hardware ?
No, I haven't read the manual.
If you try to port code that heavily relies on divisions into an ARM (without modifications, such as converting divides into multiples), you are destined to be forced to use many calls to the compiler's C library. And that is going to hurt, not?
That depends whether you design 'by experience' or by research.
The intent was good but the feasibility study was inadequate.
Given that new posts are highlighted I don't see how that would offer any real benefit, but I'm willing to give it a try.
Jack, you wrote: "That depends whether you design 'by experience' or by research."
I was talking about software that was designed on a platform that has a certain design philosophy, then ported to another platform that behaves very differently. I don't fully understand how the above comment can reconcile this difficulty, because "redesigning" the code, as correctly mentioned already would merely serve as rewriting it (I personally agree with this statement - is strictly my opinion, I don't claim it to be an ultimate truth of course), so why not begin (almost) from scratch? Please clarify if you like.
.. your statement about division.
I know of no processor supported by keil tools where a divide is not more "cumbersome for the processor" than an and