Hello friends
I have encountered a very strange problem, I have keil with MDK3.08a version. When i compiled program and loaded in the target the execution speed of the program becomes considerably slower (not in debug mode). I have compared the speed with the previous version of keil (i don't remember may be 3.20).
Furthermore when i interchange the location of C file in the work space it effects the execution speed. When I compared the output hex file of the same project with interchanged c files position in work space, i found they were different.
Please help me out of this
but for loop is not only used for delay may be i could use it for another purpose also, will that for loop not affected due to slower speed.
Let's delay worrying about that until you've demonstrated that it actually is a problem.
hello friends i wrote the following function
void os_dly_wait(unsigned int dly) { unsigned int i; for(i=0; i<=(dly *100); i++); }
in the previous version of keil its assembly was 0x00000344 E3A01000 MOV R1,#0x00000000 0x00000348 EA000000 B 0x00000350 0x0000034C E2811001 ADD R1,R1,#0x00000001 0x00000350 E3A02019 MOV R2,#0x00000019 0x00000354 E0020290 MUL R2,R0,R2 0x00000358 E1510102 CMP R1,R2,LSL #2 0x0000035C 9AFFFFFA BLS 0x0000034C
while in the newer version of keil it is 0x000003FC E3A01000 MOV R1,#0x00000000 0x00000400 EA000000 B 0x00000408 0x00000404 E2811001 ADD R1,R1,#0x00000001 0x00000408 E0802180 ADD R2,R0,R0,LSL #3 0x0000040C E0822200 ADD R2,R2,R0,LSL #4 0x00000410 E1510102 CMP R1,R2,LSL #2 0x00000414 9AFFFFFA BLS 0x00000404
when i tried to write in line assembly as follow void non_interrupt_delay(unsigned int dly) { unsigned int i,R2; __asm { MOV i,#0x00000000 B loop1 loop2: ADD i,i,#0x00000001 loop1: ADD R2,dly,dly,LSL #3 ADD R2,R2,dly,LSL #4 CMP i,R2,LSL #2 BLS loop2 } } its assembly becomes 0x000002D0 E3A01000 MOV R1,#0x00000000 0x000002D4 EA000002 B 0x000002E4 0x000002D8 E1A00000 NOP 0x000002DC E2811001 ADD R1,R1,#0x00000001 0x000002E0 E1A00000 NOP 0x000002E4 E0802180 ADD R2,R0,R0,LSL #3 0x000002E8 E0822200 ADD R2,R2,R0,LSL #4 0x000002EC E1510102 CMP R1,R2,LSL #2 0x000002F0 8A000000 BHI 0x000002F8 0x000002F4 EAFFFFF7 B 0x000002D8
due to unnecessary NOPs my delay becomes slow again. Please put your valuable ideas on this
Please read the instructions on how to post source code - they are really quite clearly stated: www.danlhenry.com/.../keil_code.png
You seem to have only compared old/new compiler.
It is expected that you will get big differences when switching compiler or compiler settings. That is why a delay using a C loop should not make use of a loop variable, but should repeat until a hw timer has ticked far enough.
But your claim seem to be that you can have a 10x speed difference for the same compiler by just changing order of the source files.
Have you produced any disassembly of the loop when switching location too? And have you checked if the processor have multiple code regions or if it has any execution cache that is only available for a limited range of the flash? Or are you running in RAM, and may get one of the loops in a RAM region where you also run heavy DMA transfers?
Again, the whole point of any High-Level Language (HLL) is that you do not have control of the generated machine code - you delegate that task to the compiler.
Since you do not have control of the generated machine code, you do not have control of its execution speed!
Since you do not have control of the execution speed, you must not rely upon the execution timing in any way!
If you really do need to rely upon the execution timing, then you really must write it in assembler; or use some means that does not rely upon the execution timing - such as a hardware timer.
It appears that you have not enabled speed optimization in the compiler: the compiler has not applied loop-invariant code motion optimization to move the multiplication operation (dly*100) out of the loop.
It seems the compiler has become smarter: now it applied strength reduction optimization and replaced a multiplication with a shift. The code size hasn't changed, but the execution speed has likely become faster. The is no loop-invariant code motion here either: looks like you set optimization to 'generate smaller code.'
But Andy whole program depends upon execution speed, for example I2C driver in which you have to generate clock of the order of microseconds, then how could you say that do not rely on that. moreover if there is a difference in execution speed it will not be in the order of 10x
"...whole program depends upon execution speed, for example I2C driver in which you have to generate clock of the order of microseconds..."
if you're requiring timing with that order of accuracy, you should seriously consider writing it in assembler.
" for example I2C driver in which you have to generate clock of the order of microseconds"
You should certainly not be using 'C' loops to time an I2C interface - for precisely the reasons stated!!
Andy do you mean to say that only loops are affected. and if yes please tell me what about other loops which are not assosiated with delay
Absolutely no!
Did you actually read what I wrote earlier:
"you do not have control of the generated machine code "
That applies to any & all HLL source code!
(unless you are using some specific compiler extension that gives you some specific guarantee)
Split this into multiple problems.
1) Figure out why you get a speed difference of 10x by moving a function from one memory address to another.
2) Make sure that hw communication is using hardware acceleration where possible. With I2C in hardware, the code will not need to busy-loop. You will instead have an interrupt handler - or maybe a status bit to poll for completion.
3) When doing things not suitable for hardware peripherials, then you should split it into short and long delays.
- long delays are best handled using the timers. Either waiting for an interrupt, or busy-looping while polling an interrupt flag or comparing the current timer value with the expected timeout value.
- short delays can be handled by sequences of NOP instructions. But you can't have a design where you rely on the majority of the delay to come from the compiler output of C code. You could possibly have a loop where the NOP instructions gives - at least - the required delay, and that any C overhead gives an unknown extra delay. But you would still have to make sure that the compiler sees some form of side effect that stops it from throwing away the loop in the first place. Good compilers normally treats their NOP intrinsic as a magic operation with side effects. The question you have to ask is if you want to rely on the compiler doing that - and that the compiler will continue to do that even if you are upgrading to a newer version - or maybe if you moves the code to a different compiler.