here are the 3 codes i need to change to shorten execution time.
any help will be greatly appreciated.
Given the names of the files, are you giving us your homework to do?
What have you tried so far, how do it go?
I am a newbie to assembly language programming.
I need to make a few changes to the code to to reduce execution time.
I have have managed to replace LDRs with a single a single LDMIA.
can any one point me in the right direction please.
here is a sample of the code peterharris
; copy and process blocks of 8 words
block_loop
LDMIA r0!,{r5-r12} ; get 8 words to copy as a block
MOV r4,r5 ; get first item
BL data_processing ; process first item
MOV r5,r4 ; keep first item
MOV r4,r6 ; get second item
BL data_processing ; process second item
MOV r6,r4 ; keep second item
MOV r4,r7 ; get third item
BL data_processing ; process third item
MOV r7,r4 ; keep third item
MOV r4,r8 ; get fourth item
BL data_processing ; process fourth item
MOV r8,r4 ; keep fourth item
MOV r4,r9 ; get fifth item
BL data_processing ; process fifth item
MOV r9,r4 ; keep fifth item
MOV r4,r10 ; get sixth item
BL data_processing ; process sixth item
MOV r10,r4 ; keep sixth item
MOV r4,r11 ; get seventh item
BL data_processing ; process seventh item
MOV r11,r4 ; keep seventh item
MOV r4,r12 ; get eighth item
BL data_processing ; process eighth item
MOV r12,r4 ; keep eighth item
STMIA r1!,{r5-r12} ; copy the 8 words
SUBS r3,r3,#1 ; move on to the next block
BNE block_loop ; continue until last block reached
PLEASE HELP!
The moves and branches don't add any value to the processing - they are just overhead because of the algorithm you are using, and the use of the data_processing function adds branch overhead.
How would you consider removing it?
If you have a short function in C what is the common means to remove the overhead of the function call, and can you apply this technique here?
thanks peter, im not quite sure what you mean by "branch overhead".
im thinking alond the lines of doing the data processing in one single loop, so it doesnt keep getting called.
Im really confused to be honest.
Ignore the current code, and design the solution from a clean sheet; the best optimizations are those which solve the problem a different way rather than trying to move instructions about.
If someone told you you needed to double four numbers as quickly as possible how would you do it?
Unless you like making your code really convoluted you would probably end up with something simple like:
LDMIA {r0-r3}, [src] ADD r0, r0, r0 ADD r1, r1, r1 ADD r2, r2, r2 ADD r3, r3, r3 STMIA {r0-r3}, [dst]
No moves, no branches. Can you apply the same principle to your code?
im not quite sure what you mean by "branch overhead".
Overhead = anything not helping compute the final value you want. Moves, branches, stack loads and stores, etc are just overhead added by the "framework" needed to run the algorithm, but they are not helping generate the actual value the algorithm emits.
HTH, Pete
Thanks pete, let me have a go at it and then i will let you how far i get.
hi pete, is this what you mean?
i have removed all the branching instuctions.
; Perform block copying of data words from one memory location to another ; Before copying, the values are divided by 2 and then saturated to a maximum ; value of 5. ; It can be assumed that the data values are non-negative
; set up the exception addresses; THUMB AREA RESET, CODE, READONLY EXPORT __Vectors EXPORT Reset_Handler__Vectors DCD 0x00180000 ; top of the stack DCD Reset_Handler ; reset vector - where the program starts
AREA Task2b_Code, CODE, READONLYReset_Handler ENTRY num_words EQU (end_source-source)/4 ; number of words to copy
start LDR r0,=source ; point to the start of the area of memory to copy from LDR r1,=dest ; point to the start of the area of memory to copy to MOV r2,#num_words ; get the number of words to copy ; find out how many blocks of 8 words need to be copied - it is assumed ; that it faster to load 8 data items at a time, rather than load ; individuallyblock MOVS r3,r2,LSR #3 ; find the number of blocks of 8 words BEQ individ ; if no blocks to copy, just copy individual words ; copy and process blocks of 8 words block_loop LDMIA r0!,{r5-r12} ; get 8 words to copy as a block CMP r5,#10 ; check whether saturation is needed MOVLT r5,r5,LSR #1 ; perform scaling MOVLE r5,#5 ; saturate to 5 CMP r6,#10 ; check whether saturation is needed MOVLT r6,r6,LSR #1 ; perform scaling MOVLE r6,#5 ; saturate to 5 CMP r7,#10 ; check whether saturation is needed MOVLT r7,r7,LSR #1 ; perform scaling MOVLE r7,#5 ; saturate to 5 CMP r8,#10 ; check whether saturation is needed MOVLT r8,r8,LSR #1 ; perform scaling MOVLE r8,#5 ; saturate to 5 CMP r9,#10 ; check whether saturation is needed MOVLT r9,r9,LSR #1 ; perform scaling MOVLE r9,#5 ; saturate to 5 CMP r10,#10 ; check whether saturation is needed MOVLT r10,r10,LSR #1 ; perform scaling MOVLE r10,#5 ; saturate to 5 CMP r11,#10 ; check whether saturation is needed MOVLT r11,r11,LSR #1 ; perform scaling MOVLE r11,#5 ; saturate to 5 CMP r12,#10 ; check whether saturation is needed MOVLT r12,r12,LSR #1 ; perform scaling MOVLE r12,#5 ; saturate to 5 STMIA r1!,{r5-r12} ; copy the 8 words SUBS r3,r3,#1 ; move on to the next block BNE block_loop ; continue until last block reached
; there may now be some data items available (fewer than 8) ; find out how many of these individual words need to be copied individ ANDS r3,r2,#7 ; find the number of words that remain to copy individually BEQ exit ; skip individual copying if none remains
; copy the excess of wordsindivid_loop LDR r4,[r0],#4 ; get next word to copy CMP r4,#10 ; check whether saturation is needed MOVLT r4,r4,LSR #1 ; perform scaling MOV r4,#5 ; saturate to 5
STR r4,[r1],#4 ; copy the word SUBS r3,r3,#1 ; move on to the next word BNE individ_loop ; continue until the last word reached
; languish in an endless loop once all is doneexit B exit
; subroutine to scale a value by 0.5 and then saturate values to a maximum of 5
AREA Task2b_ROData, DATA, READONLYsource ; some data to copy DCD 1,2,3,4,5,6,7,8,9,10,11,0,4,6,12,15,13,8,5,4,3,2,1,6,23,11,9,10 end_source
AREA Task2b_RWData, DATA, READWRITEdest ; copy to this area of memory SPACE end_source-sourceend_dest END
hey peterharris
Im getting the wrong answers with the above changes.
Any idea why?
Hi Ali, can you please explain further why its wrong. i dont fully understand.
Hi
individ_loop
MOV r4,#5 ----------------> MOVLE r4,#5
this is wrong :
CMP rx,#10
MOVLT rx,rx,LSR #1 --------------> rx=5 (always)
MOV rx,#5