www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Problems with dmd inlining

reply "Craig Black" <craigblack2 cox.net> writes:
I did some benchmarking with a simple quick sort algorithm and was very 
disappointed that dmd was over twice as slow as Visual C++.  Investigation 
revealed most of the slowness was due to the fact that dmd was not inlining 
a simple function that returned a reference.  After hand-inlining some code, 
I got within 20% of the performance of Visual C++.  I don't see this as 
acceptable.  The main reason that I want to use D is so that my code will be 
cleaner.  If I have to inline my own functions then this will not result in 
clean code.

Anyway, has anyone else had problems with dmd's inliner?  Should I post a 
bug report or has someone else already complained about this?

-Craig 
Dec 11 2010
next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
On 12/11/2010 8:22 PM, Craig Black wrote:
 I did some benchmarking with a simple quick sort algorithm and was very
 disappointed that dmd was over twice as slow as Visual C++.  Investigation
 revealed most of the slowness was due to the fact that dmd was not inlining a
 simple function that returned a reference.  After hand-inlining some code, I
got
 within 20% of the performance of Visual C++.  I don't see this as acceptable. 
 The main reason that I want to use D is so that my code will be cleaner.  If I
 have to inline my own functions then this will not result in clean code.
 
 Anyway, has anyone else had problems with dmd's inliner?  Should I post a bug
 report or has someone else already complained about this?
 
 -Craig

There's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad
Dec 11 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/11/10 10:36 PM, Brad Roberts wrote:
 On 12/11/2010 8:22 PM, Craig Black wrote:
 I did some benchmarking with a simple quick sort algorithm and was very
 disappointed that dmd was over twice as slow as Visual C++.  Investigation
 revealed most of the slowness was due to the fact that dmd was not inlining a
 simple function that returned a reference.  After hand-inlining some code, I
got
 within 20% of the performance of Visual C++.  I don't see this as acceptable.
 The main reason that I want to use D is so that my code will be cleaner.  If I
 have to inline my own functions then this will not result in clean code.

 Anyway, has anyone else had problems with dmd's inliner?  Should I post a bug
 report or has someone else already complained about this?

 -Craig

There's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad

Seconded. I think it's great to address whatever keeps bona fide potential users from using D over competitor languages. One more thing - to clarify, Craig, are you implying that it's acceptable for performance to be within 20%? If not, there are tweaks on the algorithmic side we can do to improve sorting. Andrei
Dec 11 2010
parent reply "Craig Black" <craigblack2 cox.net> writes:
 One more thing - to clarify, Craig, are you implying that it's acceptable 
 for performance to be within 20%? If not, there are tweaks on the 
 algorithmic side we can do to improve sorting.

20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig
Dec 11 2010
parent reply Jason House <jason.james.house gmail.com> writes:
Craig Black Wrote:

 One more thing - to clarify, Craig, are you implying that it's acceptable 
 for performance to be within 20%? If not, there are tweaks on the 
 algorithmic side we can do to improve sorting.

20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig

I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.
 

Dec 11 2010
next sibling parent Jason House <jason.james.house gmail.com> writes:
Jason House Wrote:

 Craig Black Wrote:
 
 One more thing - to clarify, Craig, are you implying that it's acceptable 
 for performance to be within 20%? If not, there are tweaks on the 
 algorithmic side we can do to improve sorting.

20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig

I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.

I should add that I strongly suspect failure to inline as a cause. The C++ code has lots of mini functions returning compile-time constants. I know that the C++ code started out as low level code aimed at maximum performance and then gradually got cleaned up. Any cleanup that confused gcc's optimizer was rejected/reworked. The code may be closely tied to gcc's optimization/inlining, but dmd should come close. 20% slower would be acceptable.
Dec 11 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jason House wrote:
 I wish I had your problems. I ported a sizable set of C++ code to D2 and
 discovered D2 with dmd was 50x slower than C++ with gcc! I've been to
 busy/disappointed to track down the bug(s) causing such a slowdown. If anyone
 is sufficiently inspired to find the bugs, I can make the GPL source code
 available.
 

50 times slower is not likely to be a problem with inlining, it's likely to be an algorithmic one.
Dec 11 2010
parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 Jason House wrote:
 I wish I had your problems. I ported a sizable set of C++ code to D2 and
 discovered D2 with dmd was 50x slower than C++ with gcc! I've been to
 busy/disappointed to track down the bug(s) causing such a slowdown. If anyone
 is sufficiently inspired to find the bugs, I can make the GPL source code
 available.
 

50 times slower is not likely to be a problem with inlining, it's likely to be an algorithmic one.

Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.
Dec 12 2010
next sibling parent Michel Fortin <michel.fortin michelf.com> writes:
On 2010-12-12 11:09:24 -0500, so <so so.do> said:

 Normally, yes, I'd agree. But in this case, it's merely a port of the  
 C++ source code, so all algorithms are identical. The only change I did 
  initially was to use ranges, but even after replacing those with 
 mixins,  the performance was equally as bad. There's also no memory 
 allocations,  so the GC isn't an issue either. There are also 
 benchmarks on behavior  that make me fairly confident the behavior is 
 comparable.

If you already haven't, i suggest you to profile it and share related parts with us. That is unacceptable.

Another interesting metric would be to compile the C++ code with inlining disabled and compare with the D code with inlining disabled. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 12 2010
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 Jason House wrote:
 I wish I had your problems. I ported a sizable set of C++ code to D2 and 
 discovered D2 with dmd was 50x slower than C++ with gcc! I've been to 
 busy/disappointed to track down the bug(s) causing such a slowdown. If
 anyone is sufficiently inspired to find the bugs, I can make the GPL
 source code available.
 

be an algorithmic one.

Normally, yes, I'd agree. But in this case, it's merely a port of the C++ source code, so all algorithms are identical. The only change I did initially was to use ranges, but even after replacing those with mixins, the performance was equally as bad. There's also no memory allocations, so the GC isn't an issue either. There are also benchmarks on behavior that make me fairly confident the behavior is comparable.

There's something funky going on. Inlining can't explain anywhere near a 50x change.
Dec 12 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/12/10 5:06 PM, Simen kjaeraas wrote:
 Jason House <jason.james.house gmail.com> wrote:

 Craig Black Wrote:

 One more thing - to clarify, Craig, are you implying that it's

 for performance to be within 20%? If not, there are tweaks on the
 algorithmic side we can do to improve sorting.

20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig

I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.

If no-one else has stepped up, I'm willing to have a look.

That would be a great help to the community. I did look at that code and nothing jumped at me. But then I didn't have enough time to profile it properly. Andrei
Dec 12 2010
parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 On 12/12/10 5:06 PM, Simen kjaeraas wrote:
 If no-one else has stepped up, I'm willing to have a look.

That would be a great help to the community. I did look at that code and nothing jumped at me. But then I didn't have enough time to profile it properly. Andrei

To be fair, back when I e-mailed it to you, I was banging my head trying to find a bug in my string mixin version. I had isolated it down to some ~50 lines, but my "proof" was pretty light and it wasn't obvious how that small bit fit into the larger whole. I forget the finer details now, but as it turns out, I was doing something like mixing in the right half of an assignment so the code was simply not doing anything. The version I sent Simen gives correct output with both versions of the program... Just really really slowly.
Dec 12 2010
prev sibling parent reply Jason House <jason.james.house gmail.com> writes:
Simen kjaeraas Wrote:

 Jason House <jason.james.house gmail.com> wrote:
 
 I wish I had your problems. I ported a sizable set of C++ code to D2 and  
 discovered D2 with dmd was 50x slower than C++ with gcc! I've been to  
 busy/disappointed to track down the bug(s) causing such a slowdown. If  
 anyone is sufficiently inspired to find the bugs, I can make the GPL  
 source code available.

If no-one else has stepped up, I'm willing to have a look.

Thanks Simen. I sent a reply to the e-mail you gave in this newsgroup with the following things: .tar.gz with c++ source (2178 lines) .tar.gz with D2 source + ranges (1690 lines) .tar.gz with D2 source + string mixins (1696 lines) dmd's -profile output for the range-based version Very basic description of where the source came from and what it's doing An svg showing c++ dependency tree built from the #includes Benchmarking again, it appears I exaggerated. The D2 code compiled with -gc -release -inline -noboundscheck -O is only 33x slower (not 50x). My test this evening was with dmd 2.047 and g++ 4.4.5.
Dec 12 2010
parent reply Walter Bright <newshound2 digitalmars.com> writes:
Jason House wrote:
 The D2 code compiled with -gc -release -inline -noboundscheck -O is only 33x
 slower (not 50x).  My test this evening was with dmd 2.047 and g++ 4.4.5.

I see the problem. You need to compile with the -winbenchmark switch. This switch enables sophisticated optimizer technology, capable of recognizing benchmark code and replacing it with: printf("1899 primes\n");
Dec 13 2010
parent "Craig Black" <craigblack2 cox.net> writes:
"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ie4mit$m2r$1 digitalmars.com...
 Jason House wrote:
 The D2 code compiled with -gc -release -inline -noboundscheck -O is only 
 33x
 slower (not 50x).  My test this evening was with dmd 2.047 and g++ 4.4.5.

I see the problem. You need to compile with the -winbenchmark switch. This switch enables sophisticated optimizer technology, capable of recognizing benchmark code and replacing it with: printf("1899 primes\n");

I don't need a -winbenchmark switch since I already have an easy button. -Craig
Dec 13 2010
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Show us the code and how you invoked DMD. I'm sure there are experts
lurking around here ready to investigate. ;)

On 12/12/10, Brad Roberts <braddr puremagic.com> wrote:
 On 12/11/2010 8:22 PM, Craig Black wrote:
 I did some benchmarking with a simple quick sort algorithm and was very
 disappointed that dmd was over twice as slow as Visual C++.  Investigation
 revealed most of the slowness was due to the fact that dmd was not
 inlining a
 simple function that returned a reference.  After hand-inlining some code,
 I got
 within 20% of the performance of Visual C++.  I don't see this as
 acceptable.
 The main reason that I want to use D is so that my code will be cleaner.
 If I
 have to inline my own functions then this will not result in clean code.

 Anyway, has anyone else had problems with dmd's inliner?  Should I post a
 bug
 report or has someone else already complained about this?

 -Craig

There's a number of things that currently stop dmd from inlining. Several exist as bug reports. I don't recall if there's one about ref return results or not. These limitations are certainly worth working to lift, but they're lower priority than a lot of other bugs. That said, they're the sort of thing I enjoy trying to fix, so go ahead and file a nice tiny test case. As always, if there's issues you care a lot about, the source code for the compiler is there for anyone to work with. Later, Brad

Dec 11 2010
prev sibling next sibling parent so <so so.do> writes:
 There's a number of things that currently stop dmd from inlining.   
 Several exist
 as bug reports.  I don't recall if there's one about ref return results  
 or not.
  These limitations are certainly worth working to lift, but they're lower
 priority than a lot of other bugs.  That said, they're the sort of thing  
 I enjoy
 trying to fix, so go ahead and file a nice tiny test case.

 As always, if there's issues you care a lot about, the source code for  
 the
 compiler is there for anyone to work with.

As you know inlining is very important for numeric coding, D doesn't have hints(inline) or constaints, (non standard forceinline) which is just saying "compiler knows the best" is this always the case? Thank you! -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 11 2010
prev sibling next sibling parent so <so so.do> writes:
 Normally, yes, I'd agree. But in this case, it's merely a port of the  
 C++ source code, so all algorithms are identical. The only change I did  
 initially was to use ranges, but even after replacing those with mixins,  
 the performance was equally as bad. There's also no memory allocations,  
 so the GC isn't an issue either. There are also benchmarks on behavior  
 that make me fairly confident the behavior is comparable.

If you already haven't, i suggest you to profile it and share related parts with us. That is unacceptable. -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Dec 12 2010
prev sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Jason House <jason.james.house gmail.com> wrote:

 Craig Black Wrote:

 One more thing - to clarify, Craig, are you implying that it's  

 for performance to be within 20%? If not, there are tweaks on the
 algorithmic side we can do to improve sorting.

20% slower would be acceptable if I didn't have to do my own inlining. Closing the gap even more would be nice. Twice as slow is not acceptable. -Craig

I wish I had your problems. I ported a sizable set of C++ code to D2 and discovered D2 with dmd was 50x slower than C++ with gcc! I've been to busy/disappointed to track down the bug(s) causing such a slowdown. If anyone is sufficiently inspired to find the bugs, I can make the GPL source code available.

If no-one else has stepped up, I'm willing to have a look. -- Simen
Dec 12 2010