This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Optimising C++

I've ported some code from C to C++ and I'm shocked at how much more slowly it runs. I've written a lot of C++ in the past, so I know I haven't done anything absurdly inefficient, but execution times have gone up by 150% (2.5 times slower!). Has anyone else seen anything like this? Are there any particular aspects of the Keil C++ implementation that I should look out for?

  • How about some sample code ? What exactly runs slower ?

  • a comment from the peanut gallery:

    how can you choose ++ when concerned with performance?

    Erik

  • C code compiled with a C++ compiler will compile into pretty much the same thing as with a C compiler.

    If you "port" by rewriting the code, then you have to understand C++, and you have to understand the consequences of the features you ask for. If you start slinging around dynamic_casts, RTTI, and virtual functions, then you'll pay for them.

    C++ has a "don't pay for what you don't use", and with modern compilers it works pretty well. The code is not going to be less efficient just because you call the file ".cpp".

    For that matter, if you actually need a particular feature (say, dynamic dispatch at run time with virtual functions), the equivalent implementation in C (say, a table of function pointers or a bunch of switch statements) will typically cost you just as much after you get through coding it all by hand. Of course, if you compare apples and oranges (dynamic dispatch in C++ versus static compile-time linkage in C) you'll find that one feature is bigger or slower than the other. But that's not the language; that's the programmer.

    The weakest link I know of used to be the standard I/O library. Many toolchains had trouble paring down the full couple hundred KB worth of I/O if you did just one "cout << 'string'". But that observation is pretty out of date. Major compilers probably do better with their libraries now.

    So, the first step is to look at particular differences in your code and ask what it's doing that you don't expect that makes it so big and slow.

  • If you "port" by rewriting the code, then you have to understand C++, and you have to understand the consequences of the features you ask for

    My post was with the thought in the background "why use ++ if you do not want to use the features"

    Erik

  • As far as I know, Keil's C++ compiler takes the C++ source and converts it to C code which in turn is fed into the C compiler. I've used Keil's C166 compiler a lot and my impression is that it is not very optimizing. Who knows what kind of code the C++ to C converter produces? So I wouldn't be surprised if the resulting code was inefficient.
    It's just my opinion, though.

    Regards,
    - mike

  • There's too much code to post, but here's an overview.

    The purpose of the function is to look for an address range (start address to end address) in a list of allocated address ranges. This involves doing a lot of address comparisons, where the addresses are 48 bits long. The C code uses a structure to hold the address which consists of an upper 16 bits and a lower 32 bits, and a function is called to compare structures.

    The C++ implementation defines a class with the same member variables as the old C structure. Operator methods are the defined for the >= and <= operators, which perform the same comparison as the old C comparison functions. The operators use "pass by reference".

    As the data and functions are fairly similar between C and C++, I wasn't expecting a huge difference in execution speed. I've had a quick look at the intermediate C produced by the C++ preprocessor, and it's virtually unreadable. I might have to give it more mental effort, but pointers in the right direction would be appreciated.

  • I'm working on some library code which is shared between several application. The plan is for the library to contain some generic implementations for general use, and each application can (if necessary) subclass a derivative which uses application specific optimisations.

    Optimisations based on the application can be an order of magnitude (or more) faster than the generic implementations because they can eliminate unnecessary flexibility and they can bring additional resources to bear.

  • Can't you just keep it as a 'C' section in an otherwise C++ project?

    Borland lets me do that...

    Or build it separately as a Library?

  • Who knows what kind of code the C++ to C converter produces?
    The EC166 creates C Source files. As long as you are able to open a C file with an editor, you can read the kind of code, can't you?

  • As long as you are able to open a C file with an editor, you can read the kind of code, can't you?

    Of course. It's just that I never took the time to actually do that. And I don't have access to the compiler right now. So maybe someone else has done that and had a chance to estimate the quality of generated code.

    Regards,
    - mike

  • "As long as you are able to open a C file with an editor, you can read the kind of code, can't you?"

    Oliver has already done that, and he said:

    "I've had a quick look at the intermediate C produced by the C++ preprocessor, and it's virtually unreadable." (my emphasis)

  • Perhaps I'm being a little unkind, but here are a few translations.

    // C++ source
    // contains returns true if the address is in this range
    BOOL BusAddressRange :: contains(const BusAddress& other)
    {
            BOOL b1, b2;
    
            b1 = (start <= other);
            b2 = (end >= other);
            return b1 && b2;
    }
    


    The compiler produces this C

    BOOL contains__15BusAddressRangeFRC10BusAddress( struct BusAddressRange *const this,  const struct BusAddress *other)
    {
    auto BOOL b1; auto BOOL b2;
    
    b1 = (__le__10BusAddressFRC10BusAddress((&this->start), other));
    b2 = (__ge__10BusAddressFRC10BusAddress((&this->end), other));
    return (BOOL)((b1) && (b2));
    }
    


    The C++ source for the >= operator is:

    BOOL BusAddress ::  operator>= (const BusAddress& a)
    {
            BOOL result;
            if(unHigh >= a.unHigh)
            {
                    if(unHigh == a.unHigh)
                    {
                            result = (ulLow >= a.ulLow);
                    }
                    else
                    {
                            result = TRUE;
                    }
            }
            else
            {
                    result = FALSE;
            }
            return result;
    }
    


    This is translated to:

    BOOL __ge__10BusAddressFRC10BusAddress( struct BusAddress *const this,  const struct BusAddress *a)
    {
    auto BOOL result;
    if (((unsigned)((this->unHigh))) >= ((unsigned)((a->unHigh))))
    {
    if (((unsigned)((this->unHigh))) == ((unsigned)((a->unHigh))))
    {
    result = ((BOOL)(((this->ulLow)) >= ((a->ulLow))));
    }
    
    else  {
    result = 1U;
    }
    }
    
    else  {
    result = 0U;
    }
    return result;
    }
    

    The code isn't easy to read and it took me a while to match the source to the translation. The C doesn't look too inefficient, but the stopwatch says otherwise.

  • I guess ultimately you should look at the disassembly of the generated code. I, for one, would like to see it.

    - mike

  • Notice that the creation of the class means that you're now trying to pass around pointers and have added a level of indirection to your code which may not be necessary.

    I also noticed that the implementation of this function:

    BOOL BusAddressRange :: contains(const BusAddress& other)
    {
            BOOL b1, b2;
    
            b1 = (start <= other);
            b2 = (end >= other);
            return b1 && b2;
    }
    

    could be improved. Given that BusAddressRange is a useful application object, and that contains is a common and important operation, why not code up the answer directly?

    BOOL BusAddressRange :: contains(const BusAddress& other)
    {
        return (other >= this->start) &&
               (other <= this->end);
    }
    

    No need to layer it on top of the overloaded operators just because they're there. "contains" is an first-class operator for this object. It deserves as first-class implementation. I've assumed above that there's a BusAddress::unsigned long() converter, or whatever other promotion you prefer to make the comparison simple. The implementation shown for >= makes me think that your bus addresses are bigger than your ints, which makes the comparison more complex. But I'd still suggest trying to code contains() directly. This bit of code:

    b1 = (__le__10BusAddressFRC10BusAddress((&this->start), other));
    b2 = (__ge__10BusAddressFRC10BusAddress((&this->end), other));
    return (BOOL)((b1) && (b2));
    

    is more expensive than it looks. Consider that inside this routine "this" can be a pointer passed on the stack; as a result, this->start can't be calculated at compile time. And you have an extra layer of function call. So, you start with an integer, take its address, push that on the stack, dereference it again, push the original integer back and on the stack, and then finally start looking at it inside >=. Then you do it all over again for <=. A direct implementation of contains() would avoid all that thrashing about. You also might consider exposing the implementation (gasp!) of BusAddress or making BusAddressRange a "friend" class so that it can directly access the High and Low fields of BusAddress to implement contains(). You also might want to try inlining these operations, providing the definition in the .h file* so that the optimizer has a chance to collapse the things that really are constant across the function call. *conventionally, in a ".i" file included in the .h. Some people like to keep it mostly hidden for human purposes, or turn inlining on and off.

  • Drew,
    Thanks for the comments. As you surmised (and as I mentioned earlier), the BusAddress is more than 32 bits long, and is in fact 48 bits. You have pointed out that the comparison operators are passed as pointers, which then need to be de-referenced. I understand that this is an expensive operation, but it is exactly the same as the original C code, where comparisons were done in a function which took pointers to structures as it's parameters.

    You also suggested that eliminating the intermediate Bool variables would speed things up. I agree that this should work, but I have kept close to the original C code for the time being.

    I really hope I don't have to look at the assembler to see what's happening.

    I'm tempted to try changing the storage of the 48 bit address from a 16+32 bit structure to an array of three 16 bit unsigned numbers as the compiler may be able to apply more optimisations. I also think your suggestions for inlining the code may yield some useful results as code size is not a problem.