We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I'm a bit curious as to why this bit of code wasn't AS optimized as it would normally be. I have written compilers, so I am not clueless it's just strange the optimizer didn't optimize out some of this code. This code becomes
ADCON1 = chan->constants.location_info.con1 & 0x70; ADMUX = chan->constants.location_info.mux; ADCON0 = chan->constants.type_info.con0; OCH = chan->a2d.och; OCM = chan->a2d.ocm; OCL = chan->a2d.ocl; GCH = chan->a2d.gch; GCM = chan->a2d.gcm; GCL = chan->a2d.gcl;
this code
0010 L?0038: 0010 L?0039: 0010 F582 MOV DPL,A 0012 E4 CLR A 0013 3E ADDC A,R6 0014 F583 MOV DPH,A 0016 E0 MOVX A,@DPTR 0017 22 RET 0000 8F82 MOV DPL,R7 0002 8E83 MOV DPH,R6 0004 A3 INC DPTR 0005 A3 INC DPTR 0006 E0 MOVX A,@DPTR 0007 5470 ANL A,#070H 0009 F5DD MOV ADCON1,A ; SOURCE LINE # 279 000B 8F82 MOV DPL,R7 000D 8E83 MOV DPH,R6 000F E0 MOVX A,@DPTR 0010 F5D7 MOV ADMUX,A ; SOURCE LINE # 282 0012 EF MOV A,R7 0013 2404 ADD A,#04H 0015 120000 R LCALL L?0038 0018 F5DC MOV ADCON0,A ; SOURCE LINE # 286 001A EF MOV A,R7 001B 2417 ADD A,#017H 001D 120000 R LCALL L?0038 0020 F5D3 MOV OCH,A ; SOURCE LINE # 287 0022 EF MOV A,R7 0023 2418 ADD A,#018H 0025 120000 R LCALL L?0039 0028 F5D2 MOV OCM,A ; SOURCE LINE # 288 002A EF MOV A,R7 002B 2419 ADD A,#019H 002D 120000 R LCALL L?0039 0030 F5D1 MOV OCL,A ; SOURCE LINE # 290 0032 EF MOV A,R7 0033 241A ADD A,#01AH 0035 120000 R LCALL L?0039 0038 F5D6 MOV GCH,A ; SOURCE LINE # 291 003A EF MOV A,R7 003B 241B ADD A,#01BH 003D 120000 R LCALL L?0039 0040 F5D5 MOV GCM,A ; SOURCE LINE # 292 0042 EF MOV A,R7 0043 241C ADD A,#01CH 0045 120000 R LCALL L?0039 0048 F5D4 MOV GCL,A
First all the information is references from a pointer. All variable data access is sequential from said pointer. Why isn't it optimizing out the ADD A, #XXX into INC DPTR? It has done this a number of other places in the code. Why not here?
Do I have the settings wrong or something? Do I need to arrange the code differently?
This is embedded into an ISR could that be the reason (that wouldn't make sense however ... )
If anyone can let me know if I need to do something different.
Stephen
First all the information is references from a pointer.
What kind of pointer ? Did you read the chapters about how the C51 compiler handles pointers in the manual ?
All variable data access is sequential from said pointer. Why isn't it optimizing out the ADD A, #XXX into INC DPTR?
Possibly because you are using a generic pointer instead of a memory-specific one (this cannot be determined from your code snippet). Incrementing a memory-specific pointer isn't as simple as adding something to it - it required calls to library functions (hence all the LCALLs in your code).
a memory-specific pointer isn't as simple as adding something to it - it required calls to library functions (hence all the LCALLs in your code).
Hm, scratch that. The library function calls in your code are for accessing the target of the pointer, not for the increment.
Still, it's not clear from your code what kind of variable the pointer is - is it local, static or global ? What happens to it after the code snippet ?
The pointer is a reference to a structure in XDATA space (typed xdata thus it is not a generic point). Something like CHAN xdata * chan
The variable itself is in xdata RAM and is global.
As I said I'm baffled why it doesn't just increment the DPTR and load the data sequentially. I'm not sure what is being generated underneath. My guess is it's something like
LOAD CNANPTR REFERENCE RECORD N MOV A, (N) MOV SFR, A LOAD CHANPTR REFERENCE RECORD M MOV A, (M) MOV SFR, A
Since the CHANPTR is redundant that gets optimized out whereas REFERENCE RECORD N gets rolled into a function. The problem I suspect being is the optimizer is not recognizing these as sequential address offsets to CHANPTR (IE CHANPTR is incremented by a byte offset) which would optimize to INC DPTR instead of the verbose code it's emitting now.
The problem I suspect being is the optimizer is not recognizing these as sequential address offsets to CHANPTR (IE CHANPTR is incremented by a byte offset) which would optimize to INC DPTR instead of the verbose code it's emitting now.
That could be. It should be possible to generate "optimized" code by playing with pointers and casting in C.
Is chan used in any way after the code snippet ? And which optimization levels have you tried ?
There are an almost infinite number of instruction combinations possible, and they are more or less affected by the actual parameters to the instructions.
Each optimization must be manually identified. Then it's time for a feasibility study, to check how hard the sequence is to detect, the probability that the generated code will have matching sequences and a guestimate how much faster/smaller the code will be.
Given the large number of combinations possible, it should be expected that you now and then sees code sequences that could have been optimized (better).
No 'optimizer' can optimize as well as a competent programmer.
If you need optimized, then optimize, do not rely on some mechanical function to do your work for you.
Yes, the optimizer will optimize, but if what you write is not optimum, what is the point?
someone post about you using generic pointers in this thread and, if something is NOT optimum, that takes the prize.
I have, in many cases, two 'identical' functions, the only difference being that one takes a pointer to 'code' and one takes a pointer to 'xdata'. Two
Now, the smoked sardine is going to barge in and say something stupid, just ignore it.
Erik
Yes knowingly having looked at the back end optimization of GCC was an eyefull. Generally the optimization starts with twiddling redundant stuff out. I suppose if it's done in a single pass that could lead to reductions (this is more toward algebraic simplification type analysis). I was just curious if it would recognize that it was adding sequential offsets. IE that P (instance result of REFERENCE MEMBER (N)) was indeed the prior computations P' + 1 that seems to be the challenge. The optimization doesn't make sense to human eyes because a human tends to use different logic than a programmatic approach would. I suppose changing the symbolic representation by twiddling code would work (heh). Unfortunately the stumbling block I suspect for the optimizer is the fact I am using members of a structure instead of a pointer to a series of equally sized variables (IE byte sized entities). The optimizer has to pass over several layers of symbol sets and optimizations before it can emit code. Unfortunately what optimization to apply each time is not as easy.
I thought I would see if there was some way to get the optimizer to recognize a sequence of incremental addresses like I have here. Would make the ISR time shorter and less code. However I don't want to end up with code that doesn't perform the same function.
Thanks for your comments gives me a little more confidence that I recognize the limitation. Unfortunately to keep code clarity recasting a pointer is a bad idea. (that would be the easiest way to make the code optimizer recognize sequential addresses)
I am surprised that it doesn't reduce the offset reference set more but at the same time I am not. If it were a global variable and not a pointer it would be optimized I suspect. It's the pointer and reference from a pointer that makes it difficult for the optimizer.
I thought I would see if there was some way to get the optimizer to recognize a sequence of incremental addresses like I have here. Would make the ISR time shorter and less code. However I don't want to end up with code that doesn't perform the same function. I have a place where I have to do a MAJOR sequential to sequential move. My derivative does not have 2 DPTRS which Keil AFAIK do use when available, so I wrote it in assembler (ultimate optimizer [if you know what you are doing]) and gained more than ten times performance improvement and the code "does perform the same function"
Unfortunately to keep code clarity recasting a pointer is a bad idea. please elaborate I do not see why
A good optimizer can optimize better than a good developer can manage, unless the developer spends a huge amount of time, and writes down a large number of permutations with paper and pen.
It's just a question of how hard the processor is to optimize for. The C51 processor is quite easy to optimize for (for a human). It has few registers, and very limited instructions so it is easy to keep all required state information in the head.
Let's just look back at the Intel Pentium processor. It got an exchange instruction that could swap two fp registers without consuming any clock cycles. Suddenly, it became very hard to match the Watcom compiler on fp code, because a normal human does not like to constantly swap the locations of the fp registers. You can't even document the register contents in a good way, since the fp register stack doesn't behave like a stack anymore.
A number of processors have extra bits in the instruction to inform the processor about future instructions, for example if a jump is likely to happen within a short while or if the processor should be lazy or aggresive with writing back changed memory values. Some processors always proceses the first instruction after a jump, just to make sure that they have something to do while the pipeline and memory subsystem is busy to retrieve the first instruction from the jump destination.
High-end processors are superscalar, so they process multiple instructions for each clock cycle. Even if all execution blocks are maximum fast and performs their result in a single clock cycle, the code must be rewritten so that the pipeline will always be able to issue the full set of integer or fp instructions and remember that more advanced memory addressing modes may possibly lock up one ALU for evaluation of scaled multiple-offset addresses.
Even if you are very good at writing in assembler, these non-regular hw tricks makes it very, very hard to keep track of everything, and to manage to always produce a correct program.
In theory, a human should always win over the optimizer in a compiler, but the optimizer has the advantage that it will _always_ remember all the tricks it knows about. It will not, now and then, forget to check if dirty trick "x" is applicable. It will not decide that it is too much work to perform a huge reorder/rewrite just to gain a single clock cycle - a normal developer will reach a stage where he feels: enough is enough.
This is no different from chess. A chess master does not have it easy against the best chess computers, because the associative memory has to fight against the deep and brutally fast memory of the computer. A chess master (or a human developer) has to prune the alternatives much earlier than a computer. It is only experience combined with luck that helps the chess master/developer from incorrectly pruning a winning tree.
The pattern of code generated indicates that the compiler is keeping the pointer in DTPR, and passing the offset in R7 to L?0039. The complaint is the repeated add to generate the offset for a field. The optimizer can't really peek inside the L?0039 function and inc DPTR; that would be using the DPTR as a global variable secretly passed to this linker-generated library function.
What happens if you write the code this way:
StructA2D* data a2d = &(chan->a2d); OCH = a2d->och; OCM = a2d->ocm; OCL = a2d->ocl; GCH = a2d->gch; GCM = a2d->gcm; GCL = a2d->gcl;
This might cache the pointer to the a2d struct in the DPTR. I haven't tried it, but I'd expect to see a series of calls without the add, just passing in the offsets of the fields inside the a2d struct, rather than going back to the base pointer of chan-> each time. It might even do away with L?0039 entirely.
L?0039 seems like a space optimization. It's eliminating repeated code and turning it into a tiny function. Another way to sidestep the problem is perhaps to favor speed instead of space, at least for this region of code. (You can use #pragma to change the optimizer level.)
A good optimizer can optimize better than a good developer can manage, unless the developer spends a huge amount of time, and writes down a large number of permutations with paper and pen. divide and conquer. My estimate is that less than 5% of most programs will gain anything noticeabke (to the user) by optimization. So instead of "spends a huge amount of time" spend a bit of time superoptimizing the critical function and leave the rest alone.
The C51 processor is quite easy to optimize for (for a human). It has few registers, and very limited instructions so it is easy to keep all required state information in the head. and that is the processor we discuss (see above MCU: Cx51. 8051 or MCS51)
Let's just look back at the Intel Pentium and that is not the processor we discuss (see above MCU: Cx51. 8051 or MCS51)
your insights in other processors, while valuable, do not apply to the origin of this thread.
More interesting code checking. This really shows more what is going on. The code really is very simple and straight forward and yet ...
Compiler settings:
OPTIMIZE (9,SIZE) BROWSE NOINTPROMOTE INCDIR(.\headers) DEBUG OBJECTEXTEND CODE LISTINCLUDE SYMBOLS PRINT(.\object\list_map\*.lst) TABS (3)
C Code
void print_hex4(uint8_t xdata * dat) { print_hex(*dat++); //dat++; print_hex(*dat++); //dat++; print_hex(*dat++); //dat++; print_hex(*dat); }
Output Listing
; FUNCTION _print_hex4 (BEGIN) ; SOURCE LINE # 175 0000 8E00 R MOV dat,R6 0002 8F00 R MOV dat+01H,R7 ; SOURCE LINE # 176 ; SOURCE LINE # 177 0004 0500 R INC dat+01H 0006 E500 R MOV A,dat+01H 0008 7002 JNZ ?C0051 000A 0500 R INC dat 000C ?C0051: 000C 120000 R LCALL L?0058 ; SOURCE LINE # 179 000F 0500 R INC dat+01H 0011 E500 R MOV A,dat+01H 0013 AE00 R MOV R6,dat 0015 7002 JNZ ?C0052 0017 0500 R INC dat 0019 ?C0052: 0019 120000 R LCALL L?0058 ; SOURCE LINE # 181 001C 0500 R INC dat+01H 001E E500 R MOV A,dat+01H 0020 AE00 R MOV R6,dat 0022 7002 JNZ ?C0053 0024 0500 R INC dat 0026 ?C0053: 0026 14 DEC A 0027 F582 MOV DPL,A 0029 8E83 MOV DPH,R6 002B E0 MOVX A,@DPTR 002C FF MOV R7,A 002D 120000 R LCALL _print_hex ; SOURCE LINE # 183 0030 850082 R MOV DPL,dat+01H 0033 850083 R MOV DPH,dat 0036 E0 MOVX A,@DPTR 0037 FF MOV R7,A 0038 020000 R LJMP _print_hex ; SOURCE LINE # 184 003B L?0058: 003B 14 DEC A 003C F582 MOV DPL,A 003E 8E83 MOV DPH,R6 0040 E0 MOVX A,@DPTR 0041 FF MOV R7,A 0042 120000 R LCALL _print_hex 0045 22 RET ; FUNCTION _print_hex4 (END)
This is rather .. interesting I suspect it may be the difference of how optimizers behavior between STATIC and pointer analysis. The pointer analysis appears to be the place where things aren't as optimized (although it might not be that easy to transform the code in simplification passes to 'optimal').
I'm curious, what sort of a stupid thing do you think I might say?
crawl back in the can