This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How do I calculate the execution time for a program?

Hello,

I use a LPC2148(arm7) in uVision Keil Mdk. I get the dis-assembly listing when I compile any code.Suppose if I want to calculate the execution time for the code, one way is to find the number of cycles for execution each instruction,multiply it with the clock frequency,have a recursive addition for all the instructions and finally I can get it.. But I don want to do it manually. I want to develop an app where if I enter the dis-assembly listing and the clock frequency used, it must give me my execution time. we have the readily available dis-assembly listing for the respective code generated from the ide. But how do I proceed with that? I hope I am clear with my question..

Regards
Ram Prasadh

  • oh!! I will have a look on what is that...

    "Do you really want to do it?"

    This is my scenario. Suppose 10 people are writing a code for the working of LCD module for an arm7 uP. I want to find out the execution times of each code out there and obviously we will be having a code with the least and the high execution time. Depending on those time obtained,I am going to make an algorithm for execution of the a task(in this case,the LCD code is the task) for which I will be taking the computation time of the task as the least execution time which was obtained earlier and the deadline for the task as the highest execution time obtained.. I hope I am clear with what am going to do.. So , for making these stuffs possible , I need a program to find the execution times from the dis-assembly listing...

    The thing which you said just now, full-blown cycle-accurate simulator, will it suit my need??

    Regards
    RamPrasadh

  • Why have 10 people doing the same thing?

    Are you assessing student assignments, or something like that?

    Why not just run the code, and time it?!

  • By the way, there is a simulator in Keil uVision, so you can just use that. I don't know how accurate it is timing-wise, but I'm sure you'll have to work hard to match it if you decide to write your own simulator.
    But normally it is so much easier to measure execution time in actual target hardware, and you get real numbers as opposed to simulations which can be not so accurate. I suggest you do just that.

  • By the way, there is a simulator in Keil uVision, so you can just use that. I don't know how accurate it is timing-wise,
    no simulator (I know of) consider cache misses and thus, per definition, no simulator can assess execution time accurately.

    anyhow

    why bother with a simulator? set a pin at start and reset it at end, put a scope on and you have, by simple means, a precise run time measurement.

    Erik

  • Surely, the ARM7 instruction execution time is going to be (almost) negligible relative to the timings required of the LCD...?!

  • set a pin at start and reset it at end, put a scope on and you have, by simple means, a precise run time measurement

    This will work only when I use a particular module in polling mode. Eg. A module runs,and I have an interrupt which gets serviced and then the previous module one continues, Your method will give me only the time to execute along with the context serviced too. When two or three modules of functions run separably or with interrupts,I need to get the execution time of each module, separably. Is it possible to do that?

  • Use multiple LED and turn on/off when a task calls an RTX function that may block/task switch and when it returns from that function.

    The LED for the highest priority thread will show correct time, minus losses from interrupts.
    The LED for the next highest priority can potentially continue to be lit while the higher-prio task have stepped in - but subtracting the time of the task that stepped in (simple boolean logic for the LEDs) indicates amount of time for the less prioritized task.

    In the end, it is more common to measure the time of individual threads with the other threads locked. Then to set up a model for worst-case behaviour based on what worst-case stimuli you may have that activates higher-prio threads to steal CPU capacity.

    In some situations, you'll have to settle for statistics models that runs the system at a big load while measuring the individual response times and then produces confidence values - i.e. that task C will be able to respond in less than 5ms with a confidence of 0.99 and in less than 10ms with a confidence of 0.997. You then compare your confidence intervals with the danger resulting from a missed deadline.

    In the end, few more complex systems will manage 100% confidence unless the processor is so powerful that it can always run every single thread every time and still manage all deadlines. For bigger systems, that would mean a processor that might run at much less than 1% just to have the burst capacity to do everything at once if someone really managed to create a full set of stimuli all at once.

  • So why don't you just write a script to parse the listing? I've written such static analysis tools, if you knew the first thing about what your were doing you wouldn't need to be asking questions here. How are you going to deal with loop iterations, or dynamic code flow?

    Just admit you're way out of your depth, and use a practical method like Erik suggests, and time multiple passes if you want to quantify interruptions or caching.

  • There is nothing wrong in admitting truths. Yes I am way out of my depth. I have a long way to go. Thank you for your suggestions. Will catch on later when I come come through things later.

    Regards
    RamPrasadh

  • The main think you missed here is that for an individual instruction, you could try to look up the cycle count.

    But a real program isn't a single linear sequence of instructions. It branches and loops. And some instructions takes different time depending on input data.

    Not only that. But think about a trivial memory read. It will matter if you read from flash (and in the case the flash happens to be cached in a cache line) or from RAM. And it will matter if the address corresponds with some I/O functionality. And if doing load/store, you may need to keep track of how many outstanding stores the memory controller or the I/O controller hardware can keep track of before being forced to stall the processor core to avoid overflowing.

    That is a reason that high-end processors often have internal performance counters, allowing a test program to run and then read out the performance counters to figure out
    - number of clock cycles consumed
    - number of read stalls
    - number of write stalls
    - ...

    It is almost impossible to simulate this perfectly, because you basically have to simulate the gated logic inside the processor. Keil staff neither have the capacity or the information needed to do this kind of simulation perfectly. So a single end user who haven't spent years jumping in and out of processor cores, simulators, ... will not have an easier time to perform this task.

  • Damn - why do I so often manage to write "think" when trying to write "thing"? Must be damaged synapses somewhere...