This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Benchmark Questions

This is a general question so I did not list a toolset.
Using Keil tools, are there any specific benchmark programs that you would or would not recommend?
Al Bradford

  • Using Keil tools, are there any specific benchmark programs that you would or would not recommend?

    How could anyone possibly recommend a specific benchmark in answer to such a spectactularly unspecific question?

    As-is, that question makes about as much sense as "how long is a road?"

  • I think the question was more than specific for the Keil Tools . We all know the powers of these tools for embedded controls. I don't think of any us would use these tools for programming Nevil-Stokes equations for galatical jets. Also, we would not be interested in Fortran or ADA benchmarks. We will not attempt to use these tools to program the AMD Optron 64 bit CPU.
    These tools are used to program MCU for bit pushing and boolean logic with some simple math routines.
    There are a number of synthetic benchmarks on this website such as whetstone, seive, etc.
    These are not real world tests of compiler/MCU operation. Somewhere there must exist simple benchmarks for the small MCU world.
    A google will get thousands of hits for SPEC95, EEMC and super CPU tests but I have found no info on the small MCU tests.
    Now do you have any specific info or just another critique?
    Bradford

  • what is a benchmark? I think it is a measure of the performance of a certain unit.

    Now, the use of a '51 is application specific so the question is how do you benchmark an application specific process.

    Simple:
    try it in your application.

    Erik

  • Are you trying to benchmark the tools themselves? That is, compare the size or speed of the object code, or the efficiency of the libraries? The Keil site posts benchmarks for some common algorithms.

    http://www.keil.com/benchmks/

    Of course, this is all very generic stuff. There's not much substitute for seeing what happens with your own application code.

    Or do you want to benchmark different 8051 processors? The architecture is pretty simple and well-understood. If you know the clock rate and the number of clocks per instruction (usually 12, or 4; I've seen parts that do 6 and 2 as well) then you know how fast the processor is going to be. No complicated cache and memory architecture to confuse the issue with an 8051.

  • Or do you want to benchmark different 8051 processors? The architecture is pretty simple and well-understood. If you know the clock rate and the number of clocks per instruction (usually 12, or 4; I've seen parts that do 6 and 2 as well) then you know how fast the processor is going to be. No complicated cache and memory architecture to confuse the issue with an 8051.

    It ain't necessarily so, if you take something like a SILabs f12x you have 1) a oneclocker, 2) cache, 3) possibly different speed internal and external (data) RAM and 4) not all instructions are true to the "instruction cycle" count of a basic '51.

    Erik

  • Gentlemen;
    Thank you for your replies.
    The answer to your questions is yes! I would like to evaluate both the compiler and the device operation with a simple suite of apps that would not be manually optimized for operation as most vendor samples might be.
    Yes Erik, the application operation is the real proof but we should be able to measure both software and hardware features.
    Most of the time we just want to know if the compiler makes good efficient code. But what does good and efficient mean?
    For example, the ARM chips are pipelined devices. The Keil tools have many optimizations. Some can break the pipline more often than others. Breaking the pipline to branch to a small common subroutine can add execution time but reduce code size.
    Another compiler vendor may have a different set of optimizations that break the pipline more or less often.
    What I think I might have to write is a set of apps to exercise different functions within the chip but wrapped in a single program in an effort to link in as many vendor specific libs as possible.
    A small list:
    1. A timer interrupt routine to measure interrupt latency.
    2. Some simple routines that would exercise lib calls such as rand, compares and mult/divide.
    3. Some port move routines for bit, byte and int compares.

    I'm sure I will add to the list and would welcome any inputs. My intent would be to write the apps such that they can run on different devices with minimum header files for device definition.

    Of course, I would prefer to locate some "standard benchmarks" that have been in the MCU world for awhile.

    If you look at vendor specific benchmark routines, you will see that they seldom show the source code. Just the optimized results. Have they been device/compiler hand optimized? How can you evaluate if you can't see the code?
    Thanks again for any ideas or further inputs.
    Bradford

  • Sorry; Can't seem to spell pipeline, pipeline. Must be my e key.
    Bradford

  • Yes Erik, the application operation is the real proof but we should be able to measure both software and hardware features.
    This is where the defecation hit the rotary osillator.
    If you are doing quite a bit of math, an x86 will shine, if you are doing a lot of bitmanipulation, the '51 will shine.

    the fact is that while it is easy to make a "somewhat universal" benchmark for a microprocessor it is impossible to do so for a microcontroller



    What I think I might have to write is a set of apps to exercise different functions within the chip but wrapped in a single program in an effort to link in as many vendor specific libs as possible.
    A small list:
    1. A timer interrupt routine to measure interrupt latency.
    2. Some simple routines that would exercise lib calls such as rand, compares and mult/divide.
    3. Some port move routines for bit, byte and int compares.

    1) interrupt latency is tough to measure.
    2) if a lib call to, compares and mult/divide is of interest, uou are trying to run a processor app on a controller. In a controller app all of those should be single byte and thus no lib call. As far as rand() goes, I can beat any benchmark, just by using an algorithm that is fast while even more pseudo that the other guys pseudo.
    3) no benchmark needed, jusrt look at the instruction set.

    As far as benchmarking "optimization" there is no way any "optimizer" can beat well thought out code. With an 8 bit processor you lose more by using an int where a char would do than ANY optimizer can recover.

    It is, of course, possible to use a microcontroller where a microprocessor would be the right choice, but why benchmark the wrong choice.

    Erik

  • In reverse order of your reply.
    It is, of course, possible to use a microcontroller where a microprocessor would be the right choice, but why benchmark the wrong choice.
    This is exactly the point of the exercise. Yes, if you are are bit diddling, the 8051 has few equals. If you need more data handling power move to the ARM. But in between is a large gray area. With similar size devices, the cost difference is nothing. So make some measurements to help decide which device to select.

    As far as benchmarking "optimization" there is no way any "optimizer" can beat well thought out code. With an 8 bit processor you lose more by using an int where a char would do than ANY optimizer can recover.
    I agree again. On an eight bit device, the unsigned char data type is the most efficient data type. You can't always limit to byte compares. A 10 bit or 12 bit A/D just doesn't fit. Even an 8 bit A/D that can output a negative sign bit will load a vendor specific LIb routine to do a signed compare. So few programs can be limited to just unsigned char manipulation.
    Also, on larger programs, I have a real hard time believing you can optimize better than some Keil compiler/linkers. On smaller programs, yep. Probably all you will get is some const folding and peephole which can be done away with good coding.
    As far as rand() goes, I can beat any benchmark, just by using an algorithm that is fast while even more pseudo that the other guys pseudo.
    Yes,the LCM was first introduced by Lehman in 1951 and the ACM was the favorite rand function CS profs would assign their students in the 1980s. But the vendors have written good general purpose rand functions for us. We don't need to waste time optimizing a rand function when most of the time we simply mean some arbitary number rather than a random number.
    How well/fast does the vendor specific rand function? Again we can do a simple measurement.
    interrupt latency is tough to measure.
    Yes very difficult to measure accurately. But we can compare compliers/chips with a simple standard. Most data manuals will give the hardware latency if we need to attempt to calculate the last nSec. Frankly, I don't want to design that close to a margin but I would like to compare compiler/device operation.
    Thanks again for the discussion Erik. Whether we agree or not, a discussion makes us review our options AND some of our old text books.
    Bradfrod

  • With similar size devices, the cost difference is nothing. So make some measurements to help decide which device to select.
    That is disappearing. You can get an ARM for less than 7 bucks qty 1 and looking at DigiKey the cheapest "full" '51 is 6 bucks qty 1. NO, I did not compare features but the price range is the same.

    Erik