This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

unnecessary code generation

Hello,

I'm relatively new in using Keil C166 development system. First tests are successfully so far using the Keil MCB167NET board. As a long time C programmer I'm interested in the quality of generated code. I noticed that the generated code sometimes has unnecessary code lines that I thought an optimizer should be able to find. Here is a example:

#include <C167CS.H>
#include "comtype.h"
#include "timer.h"
void timer_init()
{  struct T01_CON {
      uint  T0L :3;
      uint  T0M :1;
      uint  un00:2;
      uint  T0R :1;
      uint  un01:1;
      uint  T1L :3;
      uint  T1M :1;
      uint  un10:2;
      uint  T1R :1;
      uint  un11:1;
   };
   union {
      struct T01_CON tcon;
      uint   reg;
   } u;
   u.reg = T01CON;
   u.tcon.T0L = 2;    // FCPU / 2^(3+T0L) = 2 µs at 16MHz
   u.tcon.T0M = 0;    // Timer Mode
   u.tcon.T0R = 1;    // run Timer
   T01CON = u.reg;
}//timer_init
The genearated code looks for speed optimizer, Level 7 is:
   ; FUNCTION timer_init (BEGIN  RMASK = @0x4030)
0000 2802          SUB       R0,#02H
0002 F2F550FF      MOV       R5,T01CON
0006 B850          MOV       [R0],R5       ; u
0008 A840          MOV       R4,[R0]       ; u
000A 0AF54F42      BFLDL     R5,#04FH,#042H
000E B850          MOV       [R0],R5       ; u
0010 A840          MOV       R4,[R0]       ; u
0012 F6F550FF      MOV       T01CON,R5
0016 0802          ADD       R0,#02H
0018 CB00          RET
   ; FUNCTION timer_init (END    RMASK = @0x4030)

At first I see that at Offset 0008 and 0010 the mov to R4 is unnecessary because R4 is never read again. Removing this moves a second look noticed that also the moves at offset 0006 and 000E are unnecessary.

Should this kind of optimization be doable by a peephole optimizer or by data flow analysis?

NB: This is not critical, just as a hint for future improvements.
The most important thing with optimizers is still the correctness of generated code.

- Heinz (from Delmenhorst, Germany)

  • It is typical to directly modify the bits in an SFR. For example, the following:

    void timer_init (void)
    {
    T01CON &= ~(0x0007 | 0x0008 | 0x0040);
    T01CON |=  (0x0002 |          0x0040);
    }

    Initializes the timer exactly as your example does but without the overhead of reading the SFR into a variable, changing the bits, and writing the SFR back.

    The code generated for the above is:

    0000 66A8B0FF      AND       T01CON,#0FFB0H
    0004 76A84200      OR        T01CON,#042H
    0008 CB00          RET
    

    Another problem with your example is that if it is interrupted and if any bits in T01CON are changed, then when the interrupt exits, your example undoes those changes.

    Jon

  • Hi Heinz,

    I made a similar observation. Consider this example:

    int a, b, c, d;
    
    void main(void)
    {
        a = c / d;
        b = c % d;
    }
    
    0000 F2F70000 R    MOV       R7,d
    0004 F2F60200 R    MOV       R6,c
    0008 F6F60EFE      MOV       MDL,R6
    000C 4B77          DIV       R7
    000E F6070600 R    MOV       a,MDL
    0012 F6F60EFE      MOV       MDL,R6
    0016 4B77          DIV       R7
    0018 F2F40CFE      MOV       R4,MDH
    001C F2F50EFE      MOV       R5,MDL
    0020 F6F40400 R    MOV       b,R4
    0024 CB00          RET
    
    Whereas the more optimal code would be
    MOV     R6,d
    MOV     MDL,c
    DIV     R6
    MOV     a,MDL
    MOV     b,MDH
    RET
    
    Quite a difference, isn't it? Oh well, maybe we are spoilt by modern compilers like gcc or msvc?

    - mike

  • maybe we are spoilt by modern compilers like gcc

    GCC for the C167 generated the following code for the example you provided.

    .globl	_main
    _main:
    	mov [-r0],r1
    	mov r1,r0
    	mov [-r0],r8
    	mov r4,_c
    	mov r5,_d
    	mov MDL,r4
    	div r5
    	mov r4,MDL
    	mov _a,r4
    	mov r4,_c
    	mov r5,_d
    	mov MDL,r4
    	div r5
    	mov r4,MDH
    	mov _b,r4
    L1:
    	mov r8,[r0+]
    	mov r1,[r0+]
    	jmpi cc_UC,[r8]

    Hmmmm. I don't think I don't want to be spoiled like that! :-)

    Jon

  • Well, it tells something about those who did the porting. We all know that the C166 port of GCC was not merged with the official GCC code tree so it didn't see as much development as it could have. When I mentioned GCC, I was mostly referring to GCC for x86.
    I agree, Keil's C compiler for the C166 must be the best in the market. But it can be better.

  • Well, it tells something about those who did the porting. We all know that the C166 port of GCC was not merged with the official GCC code tree so it didn't see as much development as it could have. When I mentioned GCC, I was mostly referring to GCC for x86.

    Actually, the port I used is a commercial GCC implementation that is supposedly better than the "official" (is there such a thing) GCC 166 port.

    The problem that you pose is interesting in that you generate a dividend and remainder (in adjacent statements) for the same quotient and divisor. So, if the CPU's DIV instruction generates a dividend and a remainder, only 1 DIV instruction is required. However, I'm not sure what kind of general optimization that would be (dividene/remainder coloring???).

    I contacted one of the compiler developer's at Microsoft to see if they performed this optimization and it appears that they do. After a lengthy discussion, I think that the added this optimization to improve some kind of Windows or PC benchmarks.

    A question that needs to be answered is, how often do developer's need a dividend and remainder from the same quotient and divisor? If the answer is often, then perhaps we need to consider this optimization (we focus our efforts on improvements that benefit the greatest number of users).

    Jon

  • A question that needs to be answered is, how often do developers need a dividend and remainder from the same quotient and divisor?

    My two cents: pretty often!

    For one example, it's very common for me to run into a situation where I have some sort of table in a series of registers, where the individual items are less than the full register width. Let's say you have 16 items, two bits wide, 4 items per byte, thus taking up 4 bytes of address space in total, thus:

    
    addr  item num
    ----  --------
    0000: 3 2 1 0
    0001: 7 6 5 4
    0002: b a 9 8
    0003: f d e c
    
    


    To access item n, you need code along the lines of:

    Item GetItem (int n)
    {
    index = n / ItemsPerWord;
    offset = n % ItemsPerWord;
    
    return (*(base + index) >> offset) & mask;
    }
    

    With most designs, the div and mod should strength-reduce to shifts and masks because they're powers of two.

    I also find it moderately common any time I need a division to also need the remainder for any sort of fraction where I want to avoid floating point.

  • Hello Jon,


    It is typical to directly modify the bits in an SFR. For example, the following:

    void timer_init (void)
    {
    T01CON &= ~(0x0007 | 0x0008 | 0x0040);
    T01CON |=  (0x0002 |          0x0040);
    }
    

    Yes, I changed it in a similar way now. Your hint with the interrupt in between read and write wouldn't be a problem as this initialization only happens ones at startup. It can be a problem in other situations, however.
    My original idea was to have field names in the sfr as found in the device data sheet.

    - Heinz

  • I agree, Keil's C compiler for the C166 must be the best in the market. But it can be better.
    That's what I wanted to say ;-)

    - Heinz

  • The problem that you pose is interesting in that you generate a dividend and remainder (in adjacent statements) for the same quotient and divisor. So, if the CPU's DIV instruction generates a dividend and a remainder, only 1 DIV instruction is required. However, I'm not sure what kind of general optimization that would be (dividene/remainder coloring???).
    Maybe as a extension of common subexpression optimization? Or, as I remember from the compiler writing course, a peephole optimization on the generated code. Such a optimizer should be able to analyze code sequences with no jumps in/out. So the flow graph must still be used.

    - Heinz