This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to shorten execution time and reduce memory usage of code in ARM Cortex M3

Please i need help on shortening the execution time of the code or reduce the memory usage to improve the performance. currently the code has this as the results of the memory used but i want to reduce it and also change some instruction sets to shorten execution. pls what can i do ?

Total Read Only Size (Code +RO Data)

36(0.04kilobytes)

Total Read/Write size (Read/write data +Zero initialized data)

   0(0.00kilobytes)

Total ROM Size (code +RO+RW)

    36(0.04kilobytes)

; Calculation of a factorial value using a simple loop

; set up the exception addresses
THUMB
AREA RESET, CODE, READONLY
EXPORT __Vectors
EXPORT Reset_Handler
__Vectors
DCD 0x00180000 ; top of the stack
DCD Reset_Handler ; reset vector - where the program starts

AREA Task2a_Code, CODE, READONLY
Reset_Handler
ENTRY
start
MOV r1,#0 ; count the number of multiplications performed
MOV r2,#3 ; the final value in the factorial calculation
MOV r3,#1 ; the factorial result will be stored here

; loop r2 times forming the product
fact
ADD r1,r1,#1 ; find the next multiplicand
MUL r3,r1,r3 ; form the next product - note that MUL r3,r3,r1 gives unpredictable output
CMP r1,r2 ; check if the final value has been reached
BMI fact ; continue if all products have not been formed

exit ; stay in an endless loop
B exit
END

  • I will not provide you with a direct answer, but I'll give you some helpful pointers; all of them will increase your knowledge and skills regarding optimization. :)

     

     

    I'd like you to take a look at the instruction timings for Cortex-M3.

    In the Cortex-M3 Devices Generic User Guide, you can find detailed information about each instruction.

     

    Hint 1: It's very common to count down instead of counting up.

    Hint 2: For highest possible speed, I recommend unrolling the loop using conditional execution.

    Hint 3: Make sure all 32-bit ("wide") instructions are aligned on a 32-bit boundary; this will according to my experience avoid stalls.

    Hint 4: If your code runs from SRAM, then it will run faster on some microcontrollers (for instance LPC)

    Hint 5: In your case you do not use any load instructions; if you did, I would suggest placing as many load instructions as you can in a block to allow for the instructions to get pipelined.

    Hint 6: How many loop iterations will you need to saturate a 32-bit integer ?

     

    Note: If you're using STM microcontrollers, I do not recommend running the code from SRAM, because the code runs just as fast from Flash memory on these devices, as they have an extraordinary flash accelerator when needed.

     

    Further hints on optimizing code: Search both the Technical Reference Manual, the User Guide and the community site for: optimization, pipelining, stall, latency

    All of the above words have to do with timing.

    I also wrote a few documents, which might be helpful on the subject.