This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

what changes to the source code of ARM Cortex-M3 can i make in order to shorten execution time?

here are the 3 codes i need to change to shorten execution time.

any help will be greatly appreciated.

5734.zip
Parents
  • Ignore the current code, and design the solution from a clean sheet; the best optimizations are those which solve the problem a different way rather than trying to move instructions about.

    If someone told you you needed to double four numbers as quickly as possible how would you do it?

    Unless you like making your code really convoluted you would probably end up with something simple like:

    LDMIA {r0-r3}, [src]
    ADD r0, r0, r0
    ADD r1, r1, r1
    ADD r2, r2, r2
    ADD r3, r3, r3
    STMIA {r0-r3}, [dst]
    
    
    

    No moves, no branches. Can you apply the same principle to your code?

    im not quite sure what you mean by  "branch overhead".

    Overhead = anything not helping compute the final value you want. Moves, branches, stack loads and stores, etc are just overhead added by the "framework" needed to run the algorithm, but they are not helping generate the actual value the algorithm emits.

    HTH,
    Pete

Reply
  • Ignore the current code, and design the solution from a clean sheet; the best optimizations are those which solve the problem a different way rather than trying to move instructions about.

    If someone told you you needed to double four numbers as quickly as possible how would you do it?

    Unless you like making your code really convoluted you would probably end up with something simple like:

    LDMIA {r0-r3}, [src]
    ADD r0, r0, r0
    ADD r1, r1, r1
    ADD r2, r2, r2
    ADD r3, r3, r3
    STMIA {r0-r3}, [dst]
    
    
    

    No moves, no branches. Can you apply the same principle to your code?

    im not quite sure what you mean by  "branch overhead".

    Overhead = anything not helping compute the final value you want. Moves, branches, stack loads and stores, etc are just overhead added by the "framework" needed to run the algorithm, but they are not helping generate the actual value the algorithm emits.

    HTH,
    Pete

Children
  • Thanks pete, let me have a go at it  and then i will let you how far i get.

  • hi pete, is this what you mean?

    i have removed all the branching instuctions.

    ; Perform block copying of data words from one memory location to another
      ; Before copying, the values are divided by 2 and then saturated to a maximum
      ; value of 5.
      ; It can be assumed that the data values are non-negative

      ; set up the exception addresses
    ;  THUMB
      AREA RESET, CODE, READONLY
      EXPORT  __Vectors
      EXPORT Reset_Handler
    __Vectors
      DCD 0x00180000     ; top of the stack
      DCD Reset_Handler  ; reset vector - where the program starts

      AREA Task2b_Code, CODE, READONLY
    Reset_Handler
      ENTRY
     
    num_words EQU (end_source-source)/4  ; number of words to copy

    start
      LDR r0,=source     ; point to the start of the area of memory to copy from
      LDR r1,=dest       ; point to the start of the area of memory to copy to
      MOV r2,#num_words  ; get the number of words to copy
     
      ; find out how many blocks of 8 words need to be copied - it is assumed
      ; that it faster to load 8 data items at a time, rather than load
      ; individually
    block
      MOVS r3,r2,LSR #3  ; find the number of blocks of 8 words
      BEQ individ        ; if no blocks to copy, just copy individual words
     
      ; copy and process blocks of 8 words
    block_loop
      LDMIA r0!,{r5-r12}  ; get 8 words to copy as a block
     
      CMP r5,#10           ; check whether saturation is needed
      MOVLT r5,r5,LSR #1     ; perform scaling
      MOVLE r5,#5            ; saturate to 5
     
      CMP r6,#10           ; check whether saturation is needed
      MOVLT r6,r6,LSR #1     ; perform scaling
      MOVLE r6,#5            ; saturate to 5
     
      CMP r7,#10           ; check whether saturation is needed
      MOVLT r7,r7,LSR #1     ; perform scaling
      MOVLE r7,#5            ; saturate to 5
     
      CMP r8,#10           ; check whether saturation is needed
      MOVLT r8,r8,LSR #1     ; perform scaling
      MOVLE r8,#5            ; saturate to 5
     
      CMP r9,#10           ; check whether saturation is needed
      MOVLT r9,r9,LSR #1     ; perform scaling
      MOVLE r9,#5            ; saturate to 5
      
      CMP r10,#10           ; check whether saturation is needed
      MOVLT r10,r10,LSR #1     ; perform scaling
      MOVLE r10,#5            ; saturate to 5
     
      CMP r11,#10           ; check whether saturation is needed
      MOVLT r11,r11,LSR #1     ; perform scaling
      MOVLE r11,#5            ; saturate to 5
     
      CMP r12,#10           ; check whether saturation is needed
      MOVLT r12,r12,LSR #1     ; perform scaling
      MOVLE r12,#5            ; saturate to 5
     
      STMIA r1!,{r5-r12}  ; copy the 8 words
      SUBS r3,r3,#1       ; move on to the next block
      BNE block_loop      ; continue until last block reached

      ; there may now be some data items available (fewer than 8)
      ; find out how many of these individual words need to be copied
    individ
      ANDS r3,r2,#7   ; find the number of words that remain to copy individually
      BEQ exit        ; skip individual copying if none remains

      ; copy the excess of words
    individ_loop
      LDR r4,[r0],#4      ; get next word to copy
     
      CMP r4,#10           ; check whether saturation is needed
      MOVLT r4,r4,LSR #1     ; perform scaling
      MOV r4,#5            ; saturate to 5

      STR r4,[r1],#4 
        ; copy the word
      SUBS r3,r3,#1       ; move on to the next word
      BNE individ_loop    ; continue until the last word reached

      ; languish in an endless loop once all is done
    exit   
      B exit

      ; subroutine to scale a value by 0.5 and then saturate values to a maximum of 5

      AREA Task2b_ROData, DATA, READONLY
    source  ; some data to copy
      DCD 1,2,3,4,5,6,7,8,9,10,11,0,4,6,12,15,13,8,5,4,3,2,1,6,23,11,9,10
    end_source

      AREA Task2b_RWData, DATA, READWRITE
    dest  ; copy to this area of memory
      SPACE end_source-source
    end_dest
      END

  • hey peterharris

    Im getting the wrong answers with the above changes.

    Any idea why?

  • Hi Ali, can you please explain further why its wrong. i dont fully understand.

  • Hi

    individ_loop


    MOV r4,#5     ---------------->     MOVLE r4,#5

  • Hi

    this is wrong :

           CMP rx,#10

           MOVLT rx,rx,LSR #1          --------------> rx=5  (always)

          MOV rx,#5

  • Sorry whats does MOVLT AND MOVLE stand for ?
  • This reply was deleted.
  • It means read the manual, but perhaps or someone can help by pointing you towards the manual so you can learn :)
  • i didnt see it in the instruction set manual pls could someone help
  • *hmm* tried google?
    Anyway: ARM DDI 0100I -> chapter A3.2