This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Compiler optimisation

Hello,

1.when i use printf with no var_args then the compiler should call puts instead of printf.

ex1:


#include <REGX51.H>
#include <stdio.h>

void main(void)
{
        printf("This must call puts instead of printf");
}

Program Size: data=30.1 xdata=0 code=1103

ex2:


#include <REGX51.H>
#include <stdio.h>

void main(void)
{
        puts("This must call puts instead of printf");
}

Program Size: data=9.0 xdata=0 code=168

The above code links the printf function from the library which is huge(produces 1103 bytes).But the compiler can use puts when there is no var_args given which is much smaller than printf(produces 168 bytes).

2.The Compiler must find and remove the duplicate constant strings

ex3:


#include <REGX51.H>
#include <stdio.h>

void main(void)
{
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
        puts("This string gets duplicated as many time as i use it");
}

Program Size: data=9.0 xdata=0 code=334

ex4:


#include <REGX51.H>
#include <stdio.h>

void main(void)
{
        puts("This string gets duplicated as many time as i use it");
}

Program Size: data=9.0 xdata=0 code=183

3.Bit Test instructions are not used when i actually test for the bit

ex5:


#include <REGX51.H>
#include <stdio.h>


void main(void)
{
        if(P0^1)
        {
                P1 = 10;
        }

}


ASSEMBLY LISTING OF GENERATED OBJECT CODE

             ; FUNCTION main (BEGIN)
                                           ; SOURCE LINE # 6
                                           ; SOURCE LINE # 7
                                           ; SOURCE LINE # 8
0000 E580              MOV     A,P0
0002 6401              XRL     A,#01H
0004 6003              JZ      ?C0002
                                           ; SOURCE LINE # 9
                                           ; SOURCE LINE # 10
0006 75900A            MOV     P1,#0AH
                                           ; SOURCE LINE # 11
                                           ; SOURCE LINE # 13
0009         ?C0002:
0009 22                RET
             ; FUNCTION main (END)

In the above assembly output it should have used a single instruction JNB instead of three MOV,XRL and JZ.This is very basic anybody would object the assembly code produced.

I have not used the compiler much.But the compiler needs a look by the programmers at keil.

The above programs were all compiled with compiler optimisation level set to 9 & favour speed.

About 5 years back i compiled a c51 source code using keil.
Now i recompiled the same source code with the latest compiler from keil and compared the two output .hex files.
Unfortunately it produced exactly the same output.Here i was expecting some code and data size reduction as the compiler must be capable of optimising more.

It seems there was no improvement on the compiler side.

It is not a complaint but in the interest of improving the compiler.

regards,

S.Sheik mohamed

0 ImPer Westermark over 14 years ago in reply to Andy Neil

Note that the compiler can be smart enough to notice that x & 1 can be seen as a bit operation. It would be a lousy compiler if it couldn't.

But P0 is a volatile 8-bit variable. And the language standard requiers that the compiler does perform the volatile access. So the compiler can't do a single-bit access to a bit variable that just _happens_ to be aliased to one bit of the 8-bit P0 special function register. The compiler _must_ do the full 8-bit read of P0. Then it can use its optimization abilities to convert (x & 1) into a bit operation instead of a full byte-wise and followed by a check of the zero flag.

When you do declare a bit variable P0_0, then the compiler isn't bound by the language standard into having to read the full P0. You have explicitly then told the compiler that you want an access to a single-bit variable, and the compiler may perform such a single-bit access without involving the accumulator.

Remember that it isn't obvious what special actions that may be trigged by accesses to an SFR. In some situations, the byte read may acknowledge an interrupt mechanism. In some situations, the access may control the latching of a 16-bit timer value into two 8-bit SFR.

Keil do not want to add a lot of special code intentionally violating the language standard just because they have considered it "safe" (an assumption since the world is full of 8051 variants with varying "special" features added) to violate the language standard in special situations. And there is no need to add such special code since they have explicitly given you the means to declare bit variables to explicitly tell the compiler to perform just a bit access.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Mike Kleshov

"I've seen the Green Hills compiler for Coldfire generate BSET and BCLR instructions when I was using bitwise OR and bitwise AND operators to set or clear bits. Why wouldn't C51 do the same?"

Was that with SFR or volatile variables? Or was it with normal variables?

It really is important to separate SFR and volatile variables from standard char/int/... variables when discussing single-bit accesses.
Cancel
Vote up 0 Vote down

Cancel
0 Sheridan HambletonSmythe over 14 years ago in reply to ImPer Westermark

But P0 is a volatile 8-bit variable. And the language standard requiers that the compiler does perform the volatile access.

For what it's worth:

On a '51, a bitwise operation internally requires a read of the complete byte. For example a CLRB is actually documented as a read-modify-write of the byte. So the JB, JNB, CLR and SETB would actually satisfy the volatile requirement.
Cancel
Vote up 0 Vote down

Cancel
0 Mike Kleshov over 14 years ago in reply to ImPer Westermark

But P0 is a volatile 8-bit variable. And the language standard requiers that the compiler does perform the volatile access. So the compiler can't do a single-bit access to a bit variable that just _happens_ to be aliased to one bit of the 8-bit P0 special function register. The compiler _must_ do the full 8-bit read of P0. Then it can use its optimization abilities to convert (x & 1) into a bit operation instead of a full byte-wise and followed by a check of the zero flag.

I'd say that the volatile argument is a poor excuse for avoiding optimization here. One could argue that an SFR is not covered by the language standard: even the declaration of an SFR has a special non-standard syntax. Besides, it is known that the CPU will perform a byte read when you read a single bit.
Cancel
Vote up 0 Vote down

Cancel
0 Sheik mohamed mohamed over 14 years ago in reply to Andy Neil
In that case it must have converted the P0 into an integer & do an 'AND' operation on that integer with an integer constant 0x0001

That is in strict ANSI-C,

mov a,p0 <--- LOW(P0) <-port value andi a,#LOW(0x0001) <-integer const 1 mov r0,a //Save the result of low byte mov a,#0 <--- HIGH(P0) <-port value promoted to integer as per ANSI C andi a,#HIGH(x0001) <-integer const 1 orl a,r0 <--- is integer result 1 jz xxx

The above code is not needed we all know and the compiler too knew it.If it was to follw the ANSI-C then it would not have choosen the jnb instruction but the above big code.
But the compiler actually found the smart way to do the same thing with a bit test instruction.
But it missed the checking of whether the port byte is bit-addressable or not.

The "volatile" does not come here as the port bit is being accessed directly every time.

The compiler should produce small,smart & fast code that gives the same output when executed.

Sheik mohamed
Cancel
Vote up 0 Vote down

Cancel
0 Sheik mohamed mohamed over 14 years ago in reply to Andy Neil
I have seen compiler demote an integer to char

ex:

void delay(void) { int delay_val; for(delay_val = 0;delay_val < 250;delay_val++) { do_nothing(); } }

In the above code one of a compiler used a single 8 bit register for 'delay_val' that is
it demoted the int into char.Which may not be supported by your ANSI-C.But we all know when the code executes it will produce the same result.In fact a better result.Lower code & data memory usage and faster code execution.

It was nice & wise.Since the delay_val is never assigned a value above 255 so the compiler chose
to demote the integer to a char.

There is no ANSI-C comming here.It is just an optimization by the compiler

Sheik mohamed
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Mike Kleshov

It is irrelevant what a classic 8051 does. The classic 8051 implements the instructions based on a requirement to minimize the number of transistors. But a modern one-clocker or a soft core may introduce a lot of changes since they have a completely different transistor budget available. I'm not convinced that the read-modify-write argument is true for every 8051 implementation in existence now or next year.

There shouldn't be anything stopping a manfacturer from introducing pin-change interrupts, where a read of a single port bit acknowledges interrupts for that pin while a read of the port acknowledges/clears potential interrupts for all eight pins.
Cancel
Vote up 0 Vote down

Cancel
0 Sheridan HambletonSmythe over 14 years ago in reply to ImPer Westermark

I'm not convinced that ...

You dare to doubt the so-called "bible"? shame on you.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Sheridan HambletonSmythe

I don't doubt the bible, where it is applicable.

But I doubt that 25 year old documentation of the original 8051 is true for all 8051 variants in existence or in planning. It is enough that a single 8051 manufacturer (or programmer, if we think about soft cores) decides that bit operations can be done without a read/modify/write.

We are not debating any natural laws here. And even many of our natural laws are only applicable in their "normal" form for non-relativistic speeds.
Cancel
Vote up 0 Vote down

Cancel
0 Mike Kleshov over 14 years ago in reply to ImPer Westermark

But I doubt that 25 year old documentation of the original 8051 is true for all 8051 variants in existence or in planning.

But that shouldn't prevent the compiler writers from implementing optimizations for cases where they are applicable. That would be most if not all existing 8051 cores. A simple command line switch will take care of new/non-compliant cores.
My understanding is that the OP is expressing frustration by the lack of progress on the optimization front. And I agree with that. It appears that C51 has been in bug-fix-only mode for many years.
Cancel
Vote up 0 Vote down

Cancel
0 Andy Neil over 14 years ago in reply to Sheik mohamed mohamed

Not if you were relying upon the loop execution time based on int arithmetic...!
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Mike Kleshov

And with a switch for selecting bit optimization of SFR on/off, a number of customers will explicitly claim the Keil compiler is buggy.

It isn't even likely that the chip manufacturers would document if they do a byte-wide read/modify/write or a hardware bit access so how would Keil know if such a flag would be needed or not?
Cancel
Vote up 0 Vote down

Cancel
0 Sheridan HambletonSmythe over 14 years ago in reply to ImPer Westermark

It isn't even likely that the chip manufacturers would document ...

That is pure conjecture.

... would document if they do a byte-wide read/modify/write or a hardware bit access ...

If this unknown manufacturer were to deviate from something that people rely upon (and have done for the past 25 years) then they would have to make damn sure that it was documented as not being compatible.
Cancel
Vote up 0 Vote down

Cancel
0 ImPer Westermark over 14 years ago in reply to Sheridan HambletonSmythe

Not at all.

Who are relying on a bit test operation being implemented as a byte read or just a single bit read?

Who are relying on a bit set being implemented as a byte read followed by a byte write?

The Keil code is most definitely not relying on that. It is doing what you ask it - a byte access when the source specifies a byte access, and a bit access when the code specifies a bit access. It leaves it up to you to decide if you want a bit instruction or a byte instruction.

In the case I mentioned earlier - some chip manufacturer implementing a pin change interrupt, with acknowledge of the pin interrupts by reading specific bits, such a feature would obviously have to be documented.

But a chip manufacturer who "just" implements a 8051 core functionality would not have any real reason to document how the bit operations are implemented.

It's just when you combine two things - a peripherial device that behaves differently for a bit read or a byte read, with a core that does not implement the bit read as a byte read - that things gets interesting.

Without a peripherial that behaves differently, the bit operations are opaque black boxes. You can't know if they do bit or byte accesses and there is no reason to document what they do. But having a soft core and combining it with own SFR devices have the potential for surprises if the compiler manufacturer decides to be "clever".
Cancel
Vote up 0 Vote down

Cancel
0 Sheik mohamed mohamed over 14 years ago in reply to Andy Neil

The job of optimizer is to reduce the code , data size & make the code run fast.

Even if we switch off the optimizer and compile with different compilers on the market can you predict the execution time or the code that each compiler produces ? definiteley not.If you want to rely on execution speed you will have to check the assembly output from that compiler.When you know your code does not use above 255 then you are sure.But when you do not know you do not switch on the optimizer.

Would anybody like to loose code & data size for the sake of decreasing speed.

Sheik mohamed
Cancel
Vote up 0 Vote down

Cancel