This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Execution speed

Hello friends

I have encountered a very strange problem, I have keil with MDK3.08a version. When i compiled program and loaded in the target the execution speed of the program becomes considerably slower (not in debug mode). I have compared the speed with the previous version of keil (i don't remember may be 3.20).

Furthermore when i interchange the location of C file in the work space it effects the execution speed. When I compared the output hex file of the same project with interchanged c files position in work space, i found they were different.

Please help me out of this

Parents

0 HansBernhard Broeker over 16 years ago in reply to devendra gupta

but for loop is not only used for delay may be i could use it for another purpose also, will that for loop not affected due to slower speed.

Let's delay worrying about that until you've demonstrated that it actually is a problem.
Cancel
Vote up 0 Vote down

Cancel

Reply

0 HansBernhard Broeker over 16 years ago in reply to devendra gupta

but for loop is not only used for delay may be i could use it for another purpose also, will that for loop not affected due to slower speed.

Let's delay worrying about that until you've demonstrated that it actually is a problem.
Cancel
Vote up 0 Vote down

Cancel

Children

0 devendra gupta over 16 years ago in reply to HansBernhard Broeker

hello friends i wrote the following function

void os_dly_wait(unsigned int dly)
{ unsigned int i;
for(i=0; i<=(dly *100); i++);
}

in the previous version of keil its assembly was
0x00000344 E3A01000 MOV R1,#0x00000000
0x00000348 EA000000 B 0x00000350
0x0000034C E2811001 ADD R1,R1,#0x00000001
0x00000350 E3A02019 MOV R2,#0x00000019
0x00000354 E0020290 MUL R2,R0,R2
0x00000358 E1510102 CMP R1,R2,LSL #2
0x0000035C 9AFFFFFA BLS 0x0000034C

while in the newer version of keil it is
0x000003FC E3A01000 MOV R1,#0x00000000
0x00000400 EA000000 B 0x00000408
0x00000404 E2811001 ADD R1,R1,#0x00000001
0x00000408 E0802180 ADD R2,R0,R0,LSL #3
0x0000040C E0822200 ADD R2,R2,R0,LSL #4
0x00000410 E1510102 CMP R1,R2,LSL #2
0x00000414 9AFFFFFA BLS 0x00000404

when i tried to write in line assembly as follow
void non_interrupt_delay(unsigned int dly)
{ unsigned int i,R2; __asm { MOV i,#0x00000000 B loop1
loop2: ADD i,i,#0x00000001
loop1: ADD R2,dly,dly,LSL #3 ADD R2,R2,dly,LSL #4 CMP i,R2,LSL #2 BLS loop2 }
} its assembly becomes
0x000002D0 E3A01000 MOV R1,#0x00000000
0x000002D4 EA000002 B 0x000002E4
0x000002D8 E1A00000 NOP
0x000002DC E2811001 ADD R1,R1,#0x00000001
0x000002E0 E1A00000 NOP
0x000002E4 E0802180 ADD R2,R0,R0,LSL #3
0x000002E8 E0822200 ADD R2,R2,R0,LSL #4
0x000002EC E1510102 CMP R1,R2,LSL #2
0x000002F0 8A000000 BHI 0x000002F8
0x000002F4 EAFFFFF7 B 0x000002D8

due to unnecessary NOPs my delay becomes slow again. Please put your valuable ideas on this
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 16 years ago in reply to devendra gupta

Please read the instructions on how to post source code - they are really quite clearly stated:
www.danlhenry.com/.../keil_code.png
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 16 years ago in reply to devendra gupta

You seem to have only compared old/new compiler.

It is expected that you will get big differences when switching compiler or compiler settings. That is why a delay using a C loop should not make use of a loop variable, but should repeat until a hw timer has ticked far enough.

But your claim seem to be that you can have a 10x speed difference for the same compiler by just changing order of the source files.

Have you produced any disassembly of the loop when switching location too? And have you checked if the processor have multiple code regions or if it has any execution cache that is only available for a limited range of the flash? Or are you running in RAM, and may get one of the loops in a RAM region where you also run heavy DMA transfers?
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 16 years ago in reply to ImPer Westermark

Again, the whole point of any High-Level Language (HLL) is that you do not have control of the generated machine code - you delegate that task to the compiler.

Since you do not have control of the generated machine code, you do not have control of its execution speed!

Since you do not have control of the execution speed, you must not rely upon the execution timing in any way!

If you really do need to rely upon the execution timing, then you really must write it in assembler; or use some means that does not rely upon the execution timing - such as a hardware timer.
Cancel
Vote up 0 Vote down

Cancel
0 Mike Kleshov over 16 years ago in reply to devendra gupta

in the previous version of keil its assembly was
0x00000344 E3A01000 MOV R1,#0x00000000
0x00000348 EA000000 B 0x00000350
0x0000034C E2811001 ADD R1,R1,#0x00000001
0x00000350 E3A02019 MOV R2,#0x00000019
0x00000354 E0020290 MUL R2,R0,R2
0x00000358 E1510102 CMP R1,R2,LSL #2
0x0000035C 9AFFFFFA BLS 0x0000034C

It appears that you have not enabled speed optimization in the compiler: the compiler has not applied loop-invariant code motion optimization to move the multiplication operation (dly*100) out of the loop.

while in the newer version of keil it is
0x000003FC E3A01000 MOV R1,#0x00000000
0x00000400 EA000000 B 0x00000408
0x00000404 E2811001 ADD R1,R1,#0x00000001
0x00000408 E0802180 ADD R2,R0,R0,LSL #3
0x0000040C E0822200 ADD R2,R2,R0,LSL #4
0x00000410 E1510102 CMP R1,R2,LSL #2
0x00000414 9AFFFFFA BLS 0x00000404

It seems the compiler has become smarter: now it applied strength reduction optimization and replaced a multiplication with a shift. The code size hasn't changed, but the execution speed has likely become faster. The is no loop-invariant code motion here either: looks like you set optimization to 'generate smaller code.'
Cancel
Vote up 0 Vote down

Cancel
0 devendra gupta over 16 years ago in reply to Mike Kleshov

But Andy whole program depends upon execution speed, for example I2C driver in which you have to generate clock of the order of microseconds, then how could you say that do not rely on that. moreover if there is a difference in execution speed it will not be in the order of 10x
Cancel
Vote up 0 Vote down

Cancel
0 Non Keil Related over 16 years ago in reply to devendra gupta

"...whole program depends upon execution speed, for example I2C driver in which you have to generate clock of the order of microseconds..."

if you're requiring timing with that order of accuracy, you should seriously consider writing it in assembler.
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 16 years ago in reply to devendra gupta

" for example I2C driver in which you have to generate clock of the order of microseconds"

You should certainly not be using 'C' loops to time an I2C interface - for precisely the reasons stated!!
Cancel
Vote up 0 Vote down

Cancel
0 devendra gupta over 16 years ago in reply to Andy Neil

Andy do you mean to say that only loops are affected. and if yes please tell me what about other loops which are not assosiated with delay
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 16 years ago in reply to devendra gupta

Absolutely no!

Did you actually read what I wrote earlier:

"you do not have control of the generated machine code "

That applies to any & all HLL source code!

(unless you are using some specific compiler extension that gives you some specific guarantee)
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 16 years ago in reply to Andy Neil

Split this into multiple problems.

1) Figure out why you get a speed difference of 10x by moving a function from one memory address to another.

2) Make sure that hw communication is using hardware acceleration where possible. With I2C in hardware, the code will not need to busy-loop. You will instead have an interrupt handler - or maybe a status bit to poll for completion.

3) When doing things not suitable for hardware peripherials, then you should split it into short and long delays.

- long delays are best handled using the timers. Either waiting for an interrupt, or busy-looping while polling an interrupt flag or comparing the current timer value with the expected timeout value.

- short delays can be handled by sequences of NOP instructions. But you can't have a design where you rely on the majority of the delay to come from the compiler output of C code. You could possibly have a loop where the NOP instructions gives - at least - the required delay, and that any C overhead gives an unknown extra delay. But you would still have to make sure that the compiler sees some form of side effect that stops it from throwing away the loop in the first place. Good compilers normally treats their NOP intrinsic as a magic operation with side effects. The question you have to ask is if you want to rely on the compiler doing that - and that the compiler will continue to do that even if you are upgrading to a newer version - or maybe if you moves the code to a different compiler.
Cancel
Vote up 0 Vote down

Cancel