I've been trying to figure out why the below math benchmark code appears to run about twice as fast on the same eval board, depending on whether I use KEIL or IAR tools to build the project. The pulse on LED1 is about 6 usec's with the KEIL tools, while it's less than 3 usec's with the IAR tools. Basically, my code temporarily disables interrupts, drives an I/O pin high, does a math operation, and then drives the I/O pin low again. The function that does this is called repeatedly so that triggering on the pulse with an oscilloscope gives a pretty good indication of the chip+tools math performance. EX: float f1,f2,f3; f1 = (float)rand()/((float)rand()+1.0); f2 = (float)rand()/((float)rand()+1.0); AIC_DCR = 0x00000001; PIOA_SODR = LED1; f3 = f1 / f2; PIOA_CODR = LED1; AIC_DCR = 0x00000003; Can anyone tell me whether they've looked into which toolset does floating point math faster, and why the code generated with the KEIL tools seems to only run about 1/2 as fast as the same code generated with IAR tools? Can anyone give me a suggestion for what I could do (software changes only) to speed up the math on the KEIL generated code?
Thanks for all the responses. The project I'm working on is the most math intensive one I've ever undertaken... it depends very heavily on the speed of doing division. I know that KEIL's simulators are rated very highly, however I'm much more comfortable with results from running real code on real hardware and doing my measurements with external test equipment. I see that KEIL's benchmarks used a Philips device to do the comparison. It doesn't seem like similarly configured same ARM core based chips from different vendors should have different results when it comes to something basic like doing math, but I don't have a Philips device to do my tests with. The version of IAR compiler I've been using is V4.30a, and your benchmarks were done with IAR 4.11A and a BETA version of your tools. Perhaps KEIL's and/or IAR's math performance has changed since you've done your benchmarks. I originally set both compilers to their lowest level of optimization, to insure that the code would be generated the way the tests were constructed. I'll rework the tests to make sure that the code will run with full optimization, and see if that makes a difference. Before selecting this ATMEL ARM part I got several manufacturers to run the same math benchmark for me, that included several tests like the one I originally posted. The results were pretty interesting, and not at all what I expected. For starters, the time to divide two random floating point numbers was constant, or almost constant (two discrete very similar times) for most combinations of chips & tools, but not all of them. I double checked my results, and the time to divide two floats is constant (a little more than 8 usec's, with an MCK of ~ 54.9 MHz) with the KEIL tools I'm using. It's less than 3 usec's with the IAR tools. The time to divide doesn't seem to depend on the values of the operands at all. This may be because of how I'm using the rand function to generate floating point numbers, but I thought that dividing two random numbers was a pretty good way to get operands for the math tests. I'll look into that, and pick some values by hand that should be at the extremes of the valid min/max range of floats, and see if that changes anything. About the KWIP's results, that benchmark appears to take many more things into account than I'm currently interested in (such as doing trig functions, etc.), so it may not be the best indicator for just doing floating point division. As expected, the reported actual time required to do the division (from the manufacturer's) varied quite a bit. The ARM part with IAR tools performed much better than a competitive (non-ARM) part running at about the same speed that had a 32-bit hardware divide engine in silicon too. That chip vendor pointed to the compiler as the problem, and is attempting to fix that deficiency. I've done my best to make sure that the speed the ARM core is running at it is identical, the number of wait states is identical, etc. while I'm looking at the differences between the speed of division using these two toolsets. I'm using the same eval board, and have the clock generators MUL & DIV parameters set the same, etc. I'm using the evaluation version of the KEIL tools that I recently downloaded from the web, V3.10a with compiler V2.00d. I'm too new with this toolset to know if that's the 'best in class' that's available from KEIL. Spending ~ $4000 for S/W development tools is not an easy thing for me to do (tight budgets, red tape, too much paperwork), and I want to make sure I get the best math performance that's possible. But, I want the debugger to work too. Unfortunately, the IAR debugger currently appears to not be able to debug ISR's, and their IDE has been very difficult to get up to speed with and be productive. Those are some of the reasons why I've been looking at the KEIL tools too. The template startup code from the two tool vendors is fairly different, and I'm focusing on that right now, trying to determine if that may be the cause of this measured difference in math performance. I do appreciate the feedback. Eventually, I may post the two projects here to see if anyone from KEIL is willing to take a stab at determining what the cause of the speed differences is due to.
"The time to divide doesn't seem to depend on the values of the operands at all." Does the ARM do FP divide in hardware?
I'm not sure about how much the ARM uses H/W to do division, but I seem to recall a chip mfr FAE telling me about tables in the chip that were used in performing division. Continuing the prior thread... I changed my benchmark to use the compiler set to it's highest level of optimization, and the time to divide did decrease. Unfortunately, it decreased by only a negligible amont... a fraction of a microsecond. Still looking for a way to speed the results with the KEIL tools up to a speed similar to the IAR tools...
but I thought that dividing two random numbers was a pretty good way to get operands for the math tests Sounds likely if the time to generate the pseudorandom number is not included A pseudorandom generator rand() can be "better and slower" all depending on the implementation. If you want to compare divide, let rand() genarate am array, then mesaure the time it takes to divide the members of the array. Erik
I've already taken the time to use the rand function out of the equation. I generate the random floating point operands before I set an I/O pin to one state. I then do the division and set the I/O pin to the opposite state. I use an oscilloscope to measure the width of the pulse and that's the time to do the divide (minus, of course, the time to change the state of the I/O pin, which has previously been measured and is subtracted from the positive pulse width).
"I generate the random floating point operands before I set an I/O pin to one state." I don't really understand why you don't just use fixed values so you compare like with like? You will need to disguise them a bit so that the preprocessor doesn't 'constant fold' them away (or whatever it's called). You might also take a look at the assembler output from both compilers to see what's going on.
I don't want to get into writing my own floating point libraries, that's not what I get paid for. Besides, I have no real expertise there. I rely on tool providers for things like that. So... I'll pass on the suggestion to try to decipher the associated assembly code. Besides, I'm a newbie to ARM and the tools for them. I can't afford the time to do that level of investigation. I've got products to develop. It probably wouldn't be too hard to create a rand function of my own to insure that the min/max values returned by it would be the same for each toolset. However, as I've mentioned before, I can't find any values for operands that demonstrate that the time to divide them is different than it is for any other set of operands. For whatever it's worth, I have been trying hard coded values for the operands for a while this morning. So far, there's no difference in the time to divide them (for each particular toolset), no matter what the operands have been. I initially chose to use random numbers for the operands because I wanted to easily try very many combinations of operands, to see if the time to divide them ever changed, which would indicate that the values of the operands do indeed affect the time to divide them. Does anyone have a suggestion for best case and worst case sets of operands that should highlight that the values of the operands do indeed affect the time to do divide them?
Suggest that you get a later version of Keil ARM tools. I'm running on V2.40A. I believe there are later versions. Also, attempting to compare eval tools is not wise. I'm not sure that you get the full FP libs with the ARM eval tools. Unless your benchmarks can invoke a large number of run time libs from the tools, they are absolutly useless. From past threads started by me, you will see that I searched for a 'standard' set of benchmarks. Found nothing. Adapted a number of programs and wrote additional programs to evaluate both software and hardware. Found that most so called benchmarks created more confusion that information. But I can safely say that IAR is NOT twice as fast as Keil. No, I will not share my results. I will use them as one input to my evals but I wouldn't bet a penny on them. Also, in some past post, I remember that someone said that you could purchase a full set of Keil tools with some type of money back option. Maybe you should call your local Keil distributor for info.
Thanks for the comments Al. Are you a KEIL employee? I'm not sure how to interpret your comments, especially the one about the types of libraries that S/W developers may get with the eval versions of tools from a particular vendor. Seems odd to me that a tool vendor would purposefully ship inferior product to folks to use to evaluate them. But, I don't work for a tool company, so I'm not used to trying to think from that perspective. I think it's important to determine what features are most important to my app, and to stress the tools I'm evaluating accordingly. That's why I'm so focused on doing division quickly. My app doesn't need string functions, printf, or hardly any (tool provider) library functions like those at all. It really is a self-contained embedded controller, that happens to have an array of quite varied analog sensors that need to be used with each other to control several other analog things. Tons of division (and addition, subtraction, multiplication too, but they don't seem to consume much time), in real time, quickly, with feedback to control the systems. I am pressing on with my eval, as best I can. As I've said before, even with an offer for a full refund if I'm not completely satisfied, it's not easy for me to get folks at my company to give me $4000 to try something out. It's actually fairly difficult. I know that if I make 'the wrong decision' (about which tools to buy), and my boss wants to give me grief about the process I used to make the decision, I'd rather have hard facts to show him than someone's claims, someone who I don't know, and who won't substantiate them. Relative to the version of the eval I'm using, I will check to see if I can download a later version from KEIL's website, and try that one too. That was a good suggestion. Do you know if they ship with a crippled version of the FP library?
Seems odd to me that a tool vendor would purposefully ship inferior product to folks to use to evaluate them. But, I don't work for a tool company, so I'm not used to trying to think from that perspective. I know of no compiler that you can actually evaluate using the eval package. The "eval" is a veiled attempt tp give something away that will, eventually make you buy. the "evel" is what amateurs that do not want to invest in software use. I even heard of one that used the eval to generate assembler from C that he then converted to metalink assembler. When I were tasked with evaluation of too;sets, I asked the vendors for a full package and got it. If that is possible when you do not state that you are about to buy 8 sets, I do not know. Erik
Hmmm... interesting comments. I recall you making similar comments to me about getting a 'real' version to evaluate in a prior post. Unfortunately, I can't lie to KEIL and tell them that I'm going to buy more than one seat of these tools for this project, so I don't have the same leverage that you had :-( Like I said then, for whatever it's worth, my company has purchased several copies of different KEIL tools over the years, and we've been fairly happy with them. We've bought tools from an array of other vendors too. Sadly, KEIL doesn't support all micro's from all manufacturers ;-) I'm floored by your comment about not being able to evaluate anyone's compiler when using an eval version of their tools. Don't know if that's true or not... I certainly hope it's not! I hope KEIL comments on your assertion. I just downloaded and installed the current version of the eval (V2.40) and it made no difference at all on my floating point division benchmark. Not being an expert with the IDE and all the configurable options, I'll play around with them more seeing if I can figure out how to get better math performance from these tools. I've already set the compiler optimization to full with emphasis on code speed, I'm using the same MCK, the same wait states, the same target memory (flash ROM), etc. between the projects on the two toolsets, on the same eval board. The option to select ARM or THUMB mode doesn't seem to have any effect on the execution speed of my division test. QUOTE by Al Bradford: "Found that most so called benchmarks created more confusion that information. But I can safely say that IAR is NOT twice as fast as Keil. No, I will not share my results. I will use them as one input to my evals but I wouldn't bet a penny on them." QUESTION: Why would you use your results if you wouldn't bet a penny on them? QUESTION: Are you (Al) responsible for the benchmark results KEIL's posted on their website? This is what I know --- the same code I build with IAR versus KEIL tools (other than the startup code, which I've tried to insure is functionally equivalent where it matters) sure runs differently on the two different evals. Less than 3 usec's per divide for IAR, more than 8 usec's per divide for KEIL. I need floating point division performance more than just about anything else on this project, otherwise I wouldn't be so focused on it. I currently refuse to believe that the tool vendors ship functionally crippled math libraries with their evals, or purposefully release evals that by design can't compile good, efficient code. What's an eval for, anyway?
As to the replies to my comments, the rest refer to someone else ... getting a 'real' version to evaluate in a prior post. Unfortunately, I can't lie to KEIL 1) you should never lie. It is bad and will eventually come back and bite you in a large muscle. 2) not lying is not "unfortunate" in my book. 3) If you work for a company "known" not to use illegal copies, which "my company has purchased several copies of different KEIL tools over the years" should show there should be no problem. Sadly, KEIL doesn't support all micro's from all manufacturers ;-) That's where your brain get excersized by switching between toolsets. i use several different brands of emulators and "always" hit the wrong keys for the various functions for that reason. I once did the '51 using HiTech tools just not to switch back and forth as the project also involved an XA. I'm floored by your comment about not being able to evaluate anyone's compiler when using an eval version of their tools. Don't know if that's true or not... I Well, my opinion is "ignore the benchmarks, they are at best not telling much, at worst worthless" compile link and measure your actual job. Erik PS re benchmarks: have a look at competing manufacturers "benchmarks". Amazingly enough, they all are "the best"
Oh, one added note: When I had the opportunity to do a "real benchmark" the difference between the $1000+ toolsets was not big enough to choose one over the other for the code compactness (+-15%) or execution speed (+-20%). One exception: Tasking stated "you do not need to try our tools, they are the best" so they did not get evaluated, just dropped. None of the sub $1000 toolsets came even close. So, the choice should, in my opinion be based on 3 things: 1) and most important is the support any good? 2) do you like it and 3) does it support all uCs in the thingy. I think that all toolsets have strangths and weaknesses and, for that reason, a benchmark say nothing (does not avarage these out) creating the job does. Erik
Oh... Erik, I see... you mean ignore other people's benchmarks... not your own. That's exactly what I'm doing... writing my own tests and measuring the results. I thought you were implying that the benchmarks I wrote myself were not valid because the eval version of the tools were somehow crippled or otherwise compromised, so that the actual code that was generated was by design, flawed (i.e., slow, etc.). And, you're 100% right about not lying. Can't say that I've never done it, but I sure try not to. Honesty is always the best policy. The topic of getting a full version of the tools from KEIL to eval is a dead horse as far as I'm concerned. I've asked, they've declined, except for the offer to buy it to try it and we'll give you a refund if you don't like it. I'm not complaining about that, I just can't do it that way (at least not yet). Do you really believe that the eval tools are somehow crippled relative to the floating point library?
Oh... Erik, I see... you mean ignore other people's benchmarks... not your own. Not really, I do not believe in "benchmarks" I believe in how my code, which is not written as a benchmark, perform. Do you really believe that the eval tools are somehow crippled relative to the floating point library? well, it frequently comes up that the '51 eval does not include fp. erik
Dave; I am not an employ of Keil. No, the benchmarks published by Keil on their website are not my bench marks. I use my benchmarks for a specific purpose but not to select a compiler by running a very selected program set. Yes, I doubt some of my results, just as you appear to doubt some of your test. If you are just interested in your float divide effort and some other compiler does the job for you, go for it. I was just trying to pass along some info on benchmarks in general and trying to raise a REd flag about being too myoptic in your tool selection. No matter what you say about not using other run time libs of the tools, you will be using many built-in run time functions. I'm not talking about standard lib functions like printf, etc. I suggest that you google 'Benchmarks' and look at the many comments and cautions about benchmarks. One problem is most talk about PC or workstation environments.