I've been trying to figure out why the below math benchmark code appears to run about twice as fast on the same eval board, depending on whether I use KEIL or IAR tools to build the project. The pulse on LED1 is about 6 usec's with the KEIL tools, while it's less than 3 usec's with the IAR tools. Basically, my code temporarily disables interrupts, drives an I/O pin high, does a math operation, and then drives the I/O pin low again. The function that does this is called repeatedly so that triggering on the pulse with an oscilloscope gives a pretty good indication of the chip+tools math performance. EX: float f1,f2,f3; f1 = (float)rand()/((float)rand()+1.0); f2 = (float)rand()/((float)rand()+1.0); AIC_DCR = 0x00000001; PIOA_SODR = LED1; f3 = f1 / f2; PIOA_CODR = LED1; AIC_DCR = 0x00000003; Can anyone tell me whether they've looked into which toolset does floating point math faster, and why the code generated with the KEIL tools seems to only run about 1/2 as fast as the same code generated with IAR tools? Can anyone give me a suggestion for what I could do (software changes only) to speed up the math on the KEIL generated code?
Unless Keil and IAR are using exactly the same implementation of rand() this test is meaningless. If you take a look at the published FP operation timings for C51 (I don't know whether they exist for ARM) you'll notice that there is a maximum and a minimum for each operation. That's because the time taken depends on the actual values used.
Keil's benchmarks are here: http://www.keil.com/benchmks/carm_v0code.htm You haven't given any information at all about what compiler options you've used - this could easily account for the difference. You haven't even said whether you're useing Keil's own compiler, or GNU, and what version? Also, are you sure that the chip configurations are identical; eg, wait-states, clock multipliers/dividers, etc, etc... "Can anyone give me a suggestion for what I could do (software changes only) to speed up the math on the KEIL generated code?" Don't use floating point! Use integer (fixed-point) maths instead.
"Keil's benchmarks are here: http://www.keil.com/benchmks/carm_v0code.htm" Unfortunately the information given doesn't highlight the fact that the speed of floating point operations vary with the data used. The C51 ones do: http://www.keil.com/benchmks/tm_c51_v7_small.asp It does seem, however, that Keil ARM should run at about 3.5 times the speed of IAR's implementation on average, if a 'KWIP' is in fact a meaningful measurement.
took a peek and it clearly show that "you get what you pay for". The "paid for" compilers are, for all practical purposes, equal but have a look at the abysmal performance of the GNU. Erik
Thanks for all the responses. The project I'm working on is the most math intensive one I've ever undertaken... it depends very heavily on the speed of doing division. I know that KEIL's simulators are rated very highly, however I'm much more comfortable with results from running real code on real hardware and doing my measurements with external test equipment. I see that KEIL's benchmarks used a Philips device to do the comparison. It doesn't seem like similarly configured same ARM core based chips from different vendors should have different results when it comes to something basic like doing math, but I don't have a Philips device to do my tests with. The version of IAR compiler I've been using is V4.30a, and your benchmarks were done with IAR 4.11A and a BETA version of your tools. Perhaps KEIL's and/or IAR's math performance has changed since you've done your benchmarks. I originally set both compilers to their lowest level of optimization, to insure that the code would be generated the way the tests were constructed. I'll rework the tests to make sure that the code will run with full optimization, and see if that makes a difference. Before selecting this ATMEL ARM part I got several manufacturers to run the same math benchmark for me, that included several tests like the one I originally posted. The results were pretty interesting, and not at all what I expected. For starters, the time to divide two random floating point numbers was constant, or almost constant (two discrete very similar times) for most combinations of chips & tools, but not all of them. I double checked my results, and the time to divide two floats is constant (a little more than 8 usec's, with an MCK of ~ 54.9 MHz) with the KEIL tools I'm using. It's less than 3 usec's with the IAR tools. The time to divide doesn't seem to depend on the values of the operands at all. This may be because of how I'm using the rand function to generate floating point numbers, but I thought that dividing two random numbers was a pretty good way to get operands for the math tests. I'll look into that, and pick some values by hand that should be at the extremes of the valid min/max range of floats, and see if that changes anything. About the KWIP's results, that benchmark appears to take many more things into account than I'm currently interested in (such as doing trig functions, etc.), so it may not be the best indicator for just doing floating point division. As expected, the reported actual time required to do the division (from the manufacturer's) varied quite a bit. The ARM part with IAR tools performed much better than a competitive (non-ARM) part running at about the same speed that had a 32-bit hardware divide engine in silicon too. That chip vendor pointed to the compiler as the problem, and is attempting to fix that deficiency. I've done my best to make sure that the speed the ARM core is running at it is identical, the number of wait states is identical, etc. while I'm looking at the differences between the speed of division using these two toolsets. I'm using the same eval board, and have the clock generators MUL & DIV parameters set the same, etc. I'm using the evaluation version of the KEIL tools that I recently downloaded from the web, V3.10a with compiler V2.00d. I'm too new with this toolset to know if that's the 'best in class' that's available from KEIL. Spending ~ $4000 for S/W development tools is not an easy thing for me to do (tight budgets, red tape, too much paperwork), and I want to make sure I get the best math performance that's possible. But, I want the debugger to work too. Unfortunately, the IAR debugger currently appears to not be able to debug ISR's, and their IDE has been very difficult to get up to speed with and be productive. Those are some of the reasons why I've been looking at the KEIL tools too. The template startup code from the two tool vendors is fairly different, and I'm focusing on that right now, trying to determine if that may be the cause of this measured difference in math performance. I do appreciate the feedback. Eventually, I may post the two projects here to see if anyone from KEIL is willing to take a stab at determining what the cause of the speed differences is due to.
"The time to divide doesn't seem to depend on the values of the operands at all." Does the ARM do FP divide in hardware?
I'm not sure about how much the ARM uses H/W to do division, but I seem to recall a chip mfr FAE telling me about tables in the chip that were used in performing division. Continuing the prior thread... I changed my benchmark to use the compiler set to it's highest level of optimization, and the time to divide did decrease. Unfortunately, it decreased by only a negligible amont... a fraction of a microsecond. Still looking for a way to speed the results with the KEIL tools up to a speed similar to the IAR tools...
but I thought that dividing two random numbers was a pretty good way to get operands for the math tests Sounds likely if the time to generate the pseudorandom number is not included A pseudorandom generator rand() can be "better and slower" all depending on the implementation. If you want to compare divide, let rand() genarate am array, then mesaure the time it takes to divide the members of the array. Erik
I've already taken the time to use the rand function out of the equation. I generate the random floating point operands before I set an I/O pin to one state. I then do the division and set the I/O pin to the opposite state. I use an oscilloscope to measure the width of the pulse and that's the time to do the divide (minus, of course, the time to change the state of the I/O pin, which has previously been measured and is subtracted from the positive pulse width).
"I generate the random floating point operands before I set an I/O pin to one state." I don't really understand why you don't just use fixed values so you compare like with like? You will need to disguise them a bit so that the preprocessor doesn't 'constant fold' them away (or whatever it's called). You might also take a look at the assembler output from both compilers to see what's going on.
I don't want to get into writing my own floating point libraries, that's not what I get paid for. Besides, I have no real expertise there. I rely on tool providers for things like that. So... I'll pass on the suggestion to try to decipher the associated assembly code. Besides, I'm a newbie to ARM and the tools for them. I can't afford the time to do that level of investigation. I've got products to develop. It probably wouldn't be too hard to create a rand function of my own to insure that the min/max values returned by it would be the same for each toolset. However, as I've mentioned before, I can't find any values for operands that demonstrate that the time to divide them is different than it is for any other set of operands. For whatever it's worth, I have been trying hard coded values for the operands for a while this morning. So far, there's no difference in the time to divide them (for each particular toolset), no matter what the operands have been. I initially chose to use random numbers for the operands because I wanted to easily try very many combinations of operands, to see if the time to divide them ever changed, which would indicate that the values of the operands do indeed affect the time to divide them. Does anyone have a suggestion for best case and worst case sets of operands that should highlight that the values of the operands do indeed affect the time to do divide them?
Suggest that you get a later version of Keil ARM tools. I'm running on V2.40A. I believe there are later versions. Also, attempting to compare eval tools is not wise. I'm not sure that you get the full FP libs with the ARM eval tools. Unless your benchmarks can invoke a large number of run time libs from the tools, they are absolutly useless. From past threads started by me, you will see that I searched for a 'standard' set of benchmarks. Found nothing. Adapted a number of programs and wrote additional programs to evaluate both software and hardware. Found that most so called benchmarks created more confusion that information. But I can safely say that IAR is NOT twice as fast as Keil. No, I will not share my results. I will use them as one input to my evals but I wouldn't bet a penny on them. Also, in some past post, I remember that someone said that you could purchase a full set of Keil tools with some type of money back option. Maybe you should call your local Keil distributor for info.
Thanks for the comments Al. Are you a KEIL employee? I'm not sure how to interpret your comments, especially the one about the types of libraries that S/W developers may get with the eval versions of tools from a particular vendor. Seems odd to me that a tool vendor would purposefully ship inferior product to folks to use to evaluate them. But, I don't work for a tool company, so I'm not used to trying to think from that perspective. I think it's important to determine what features are most important to my app, and to stress the tools I'm evaluating accordingly. That's why I'm so focused on doing division quickly. My app doesn't need string functions, printf, or hardly any (tool provider) library functions like those at all. It really is a self-contained embedded controller, that happens to have an array of quite varied analog sensors that need to be used with each other to control several other analog things. Tons of division (and addition, subtraction, multiplication too, but they don't seem to consume much time), in real time, quickly, with feedback to control the systems. I am pressing on with my eval, as best I can. As I've said before, even with an offer for a full refund if I'm not completely satisfied, it's not easy for me to get folks at my company to give me $4000 to try something out. It's actually fairly difficult. I know that if I make 'the wrong decision' (about which tools to buy), and my boss wants to give me grief about the process I used to make the decision, I'd rather have hard facts to show him than someone's claims, someone who I don't know, and who won't substantiate them. Relative to the version of the eval I'm using, I will check to see if I can download a later version from KEIL's website, and try that one too. That was a good suggestion. Do you know if they ship with a crippled version of the FP library?
Seems odd to me that a tool vendor would purposefully ship inferior product to folks to use to evaluate them. But, I don't work for a tool company, so I'm not used to trying to think from that perspective. I know of no compiler that you can actually evaluate using the eval package. The "eval" is a veiled attempt tp give something away that will, eventually make you buy. the "evel" is what amateurs that do not want to invest in software use. I even heard of one that used the eval to generate assembler from C that he then converted to metalink assembler. When I were tasked with evaluation of too;sets, I asked the vendors for a full package and got it. If that is possible when you do not state that you are about to buy 8 sets, I do not know. Erik
Hmmm... interesting comments. I recall you making similar comments to me about getting a 'real' version to evaluate in a prior post. Unfortunately, I can't lie to KEIL and tell them that I'm going to buy more than one seat of these tools for this project, so I don't have the same leverage that you had :-( Like I said then, for whatever it's worth, my company has purchased several copies of different KEIL tools over the years, and we've been fairly happy with them. We've bought tools from an array of other vendors too. Sadly, KEIL doesn't support all micro's from all manufacturers ;-) I'm floored by your comment about not being able to evaluate anyone's compiler when using an eval version of their tools. Don't know if that's true or not... I certainly hope it's not! I hope KEIL comments on your assertion. I just downloaded and installed the current version of the eval (V2.40) and it made no difference at all on my floating point division benchmark. Not being an expert with the IDE and all the configurable options, I'll play around with them more seeing if I can figure out how to get better math performance from these tools. I've already set the compiler optimization to full with emphasis on code speed, I'm using the same MCK, the same wait states, the same target memory (flash ROM), etc. between the projects on the two toolsets, on the same eval board. The option to select ARM or THUMB mode doesn't seem to have any effect on the execution speed of my division test. QUOTE by Al Bradford: "Found that most so called benchmarks created more confusion that information. But I can safely say that IAR is NOT twice as fast as Keil. No, I will not share my results. I will use them as one input to my evals but I wouldn't bet a penny on them." QUESTION: Why would you use your results if you wouldn't bet a penny on them? QUESTION: Are you (Al) responsible for the benchmark results KEIL's posted on their website? This is what I know --- the same code I build with IAR versus KEIL tools (other than the startup code, which I've tried to insure is functionally equivalent where it matters) sure runs differently on the two different evals. Less than 3 usec's per divide for IAR, more than 8 usec's per divide for KEIL. I need floating point division performance more than just about anything else on this project, otherwise I wouldn't be so focused on it. I currently refuse to believe that the tool vendors ship functionally crippled math libraries with their evals, or purposefully release evals that by design can't compile good, efficient code. What's an eval for, anyway?