Please i need help on shortening the execution time of the code or reduce the memory usage to improve the performance. currently the code has this as the results of the memory used but i want to reduce it and also change some instruction sets to shorten execution. pls what can i do ?
Total Read Only Size (Code +RO Data)
36(0.04kilobytes)
Total Read/Write size (Read/write data +Zero initialized data)
0(0.00kilobytes)
Total ROM Size (code +RO+RW)
; Calculation of a factorial value using a simple loop
; set up the exception addresses THUMB AREA RESET, CODE, READONLY EXPORT __Vectors EXPORT Reset_Handler__Vectors DCD 0x00180000 ; top of the stack DCD Reset_Handler ; reset vector - where the program starts
AREA Task2a_Code, CODE, READONLYReset_Handler ENTRYstart MOV r1,#0 ; count the number of multiplications performed MOV r2,#3 ; the final value in the factorial calculation MOV r3,#1 ; the factorial result will be stored here
; loop r2 times forming the product fact ADD r1,r1,#1 ; find the next multiplicand MUL r3,r1,r3 ; form the next product - note that MUL r3,r3,r1 gives unpredictable output CMP r1,r2 ; check if the final value has been reached BMI fact ; continue if all products have not been formed exit ; stay in an endless loop B exit END
I will not provide you with a direct answer, but I'll give you some helpful pointers; all of them will increase your knowledge and skills regarding optimization. :)
I'd like you to take a look at the instruction timings for Cortex-M3.
In the Cortex-M3 Devices Generic User Guide, you can find detailed information about each instruction.
Hint 1: It's very common to count down instead of counting up.
Hint 2: For highest possible speed, I recommend unrolling the loop using conditional execution.
Hint 3: Make sure all 32-bit ("wide") instructions are aligned on a 32-bit boundary; this will according to my experience avoid stalls.
Hint 4: If your code runs from SRAM, then it will run faster on some microcontrollers (for instance LPC)
Hint 5: In your case you do not use any load instructions; if you did, I would suggest placing as many load instructions as you can in a block to allow for the instructions to get pipelined.
Hint 6: How many loop iterations will you need to saturate a 32-bit integer ?
Note: If you're using STM microcontrollers, I do not recommend running the code from SRAM, because the code runs just as fast from Flash memory on these devices, as they have an extraordinary flash accelerator when needed.
Further hints on optimizing code: Search both the Technical Reference Manual, the User Guide and the community site for: optimization, pipelining, stall, latency
All of the above words have to do with timing.
I also wrote a few documents, which might be helpful on the subject.