We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I've ported some code from C to C++ and I'm shocked at how much more slowly it runs. I've written a lot of C++ in the past, so I know I haven't done anything absurdly inefficient, but execution times have gone up by 150% (2.5 times slower!). Has anyone else seen anything like this? Are there any particular aspects of the Keil C++ implementation that I should look out for?
"As long as you are able to open a C file with an editor, you can read the kind of code, can't you?"
Oliver has already done that, and he said:
"I've had a quick look at the intermediate C produced by the C++ preprocessor, and it's virtually unreadable." (my emphasis)
Perhaps I'm being a little unkind, but here are a few translations.
// C++ source // contains returns true if the address is in this range BOOL BusAddressRange :: contains(const BusAddress& other) { BOOL b1, b2; b1 = (start <= other); b2 = (end >= other); return b1 && b2; }
The compiler produces this C
BOOL contains__15BusAddressRangeFRC10BusAddress( struct BusAddressRange *const this, const struct BusAddress *other) { auto BOOL b1; auto BOOL b2; b1 = (__le__10BusAddressFRC10BusAddress((&this->start), other)); b2 = (__ge__10BusAddressFRC10BusAddress((&this->end), other)); return (BOOL)((b1) && (b2)); }
The C++ source for the >= operator is:
BOOL BusAddress :: operator>= (const BusAddress& a) { BOOL result; if(unHigh >= a.unHigh) { if(unHigh == a.unHigh) { result = (ulLow >= a.ulLow); } else { result = TRUE; } } else { result = FALSE; } return result; }
This is translated to:
BOOL __ge__10BusAddressFRC10BusAddress( struct BusAddress *const this, const struct BusAddress *a) { auto BOOL result; if (((unsigned)((this->unHigh))) >= ((unsigned)((a->unHigh)))) { if (((unsigned)((this->unHigh))) == ((unsigned)((a->unHigh)))) { result = ((BOOL)(((this->ulLow)) >= ((a->ulLow)))); } else { result = 1U; } } else { result = 0U; } return result; }
The code isn't easy to read and it took me a while to match the source to the translation. The C doesn't look too inefficient, but the stopwatch says otherwise.
I guess ultimately you should look at the disassembly of the generated code. I, for one, would like to see it.
- mike
Notice that the creation of the class means that you're now trying to pass around pointers and have added a level of indirection to your code which may not be necessary.
I also noticed that the implementation of this function:
BOOL BusAddressRange :: contains(const BusAddress& other) { BOOL b1, b2; b1 = (start <= other); b2 = (end >= other); return b1 && b2; }
could be improved. Given that BusAddressRange is a useful application object, and that contains is a common and important operation, why not code up the answer directly?
BOOL BusAddressRange :: contains(const BusAddress& other) { return (other >= this->start) && (other <= this->end); }
No need to layer it on top of the overloaded operators just because they're there. "contains" is an first-class operator for this object. It deserves as first-class implementation. I've assumed above that there's a BusAddress::unsigned long() converter, or whatever other promotion you prefer to make the comparison simple. The implementation shown for >= makes me think that your bus addresses are bigger than your ints, which makes the comparison more complex. But I'd still suggest trying to code contains() directly. This bit of code:
b1 = (__le__10BusAddressFRC10BusAddress((&this->start), other)); b2 = (__ge__10BusAddressFRC10BusAddress((&this->end), other)); return (BOOL)((b1) && (b2));
is more expensive than it looks. Consider that inside this routine "this" can be a pointer passed on the stack; as a result, this->start can't be calculated at compile time. And you have an extra layer of function call. So, you start with an integer, take its address, push that on the stack, dereference it again, push the original integer back and on the stack, and then finally start looking at it inside >=. Then you do it all over again for <=. A direct implementation of contains() would avoid all that thrashing about. You also might consider exposing the implementation (gasp!) of BusAddress or making BusAddressRange a "friend" class so that it can directly access the High and Low fields of BusAddress to implement contains(). You also might want to try inlining these operations, providing the definition in the .h file* so that the optimizer has a chance to collapse the things that really are constant across the function call. *conventionally, in a ".i" file included in the .h. Some people like to keep it mostly hidden for human purposes, or turn inlining on and off.
Drew, Thanks for the comments. As you surmised (and as I mentioned earlier), the BusAddress is more than 32 bits long, and is in fact 48 bits. You have pointed out that the comparison operators are passed as pointers, which then need to be de-referenced. I understand that this is an expensive operation, but it is exactly the same as the original C code, where comparisons were done in a function which took pointers to structures as it's parameters.
You also suggested that eliminating the intermediate Bool variables would speed things up. I agree that this should work, but I have kept close to the original C code for the time being.
I really hope I don't have to look at the assembler to see what's happening.
I'm tempted to try changing the storage of the 48 bit address from a 16+32 bit structure to an array of three 16 bit unsigned numbers as the compiler may be able to apply more optimisations. I also think your suggestions for inlining the code may yield some useful results as code size is not a problem.
When program optimization comes to the point of trying to figure out what code the compiler likes the most, it all begins to look rather silly. This function needs a dramatic speedup? It's a perfect candidate for implementation in assembly language. From my experience, I wouldn't expect wonders of optimization from the C166 compiler.
"When program optimization comes to the point of trying to figure out what code the compiler likes the most, it all begins to look rather silly."
Absolutely - the whole point of using a HLL is to make life easier (some say, "more productive") for the Programmer. Clearly the C++ is failing in this!
"It's a perfect candidate for implementation in assembly language."
Or, if the original 'C' performance was OK, why not leave it in plain 'C'?
Well the original C code wasn't fast enough either! The 48 bit comparisons can probably be done really quickly in assembler, and wouldn't cause code bloat if inlined. I'm probably going to accelerate it in an FPGA at the end of the day.
As for re-coding for speed, I think it's a reasonable thing to do if you get enough benefit for the effort involved. It's not the worst solution in the world.