www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - On inlining in D libraries

reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
While investigating std.regex performance in Phobos I've found that a 
lot of stuff never gets inlined (contrary to my expectations).

Namely the 3 critical ones were declared like this:
struct Bytecode{
     uint raw;
     //bit twiddling helpers
      property uint data() const { return raw & 0x003f_ffff; }

     //ditto
      property uint sequence() const { return 2 + (raw >> 22 & 0x3); }

     //ditto
      property IR code() const { return cast(IR)(raw>>24); }
     ...
}

And my quick hack to get them inlined - 0-arg templates:
https://github.com/D-Programming-Language/phobos/pull/1553

The "stuff" in question turns out to be anything that is not a template 
and (consequently) is compiled into library. At first I thought it's a 
horrible bug somewhere in DMD's inliner,  but this behavior is the same 
regardless of compiler. (It could be a bug of the front-end in general)

Few days after filing the bug report with minimal test case:
http://d.puremagic.com/issues/show_bug.cgi?id=10985

I'm  not so sure if that's not an issue of separate compilation to begin 
with. I *thought* that the intended behavior is
a) Have source - compile from source
b) Don't have source (*.di files) - link in objects

But I don't have much to go on this. Somebody from compiler team could 
probably shed some light on this. If I'm wrong then 0-arg templates is 
actually the only way out to get 'explicitly inline' of C++.

In C++ that would look like this:
//header
struct A{
	int foo();
}
//source
int A::foo(){ ... }

C++ explicitly inlined:
//header
struct A{
	int foo(){ ... }
}

In D we don't have this distinction.
It has to be decided then if we adopt 0-arg as intended solution, or 
tweak front-end to always peruse accessible source when inlining.

Anyhow as it stands you have one of the following:
a) Do nothing. Then using e.g. isAlpha from std.ascii (or pick your 
favorite one-liner) is useless as it would never outperform a 
hand-rolled version (that could be 1:1 the same) because the latter will 
be inlined.
b) Pass all of the interesting files from Phobos on the command line to 
get them fully scanned for inlining (and get compiled anew each time I 
guess).
c) For code under your control - add an empty pair of brackets to 
anything that has to be inlined.

None of the above options is nice.

-- 
Dmitry Olshansky
Sep 09 2013
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky 
wrote:
 b) Pass all of the interesting files from Phobos on the command 
 line to get them fully scanned for inlining (and get compiled 
 anew each time I guess).

They more or less get compiled anew anyway since there's so many templates it has to run through, as well as the web of dependencies meaning it reads those files thanks to imports too. Listing the files could be made easy with the dmd -r people have talked about (taking what rdmd does and putting it in the compiler). Then it does it automatically. I doubt you'll see much impact on compile speed. Importing a phobos module is dog slow already, so it can't get much worse in any case.
Sep 09 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Sep-2013 17:05, Adam D. Ruppe пишет:
 On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky wrote:
 b) Pass all of the interesting files from Phobos on the command line
 to get them fully scanned for inlining (and get compiled anew each
 time I guess).

They more or less get compiled anew anyway since there's so many templates it has to run through, as well as the web of dependencies meaning it reads those files thanks to imports too.

This was my intuition, but currently it won't go beyond templates code-gen wise. It however seems to analyze the whole code.
 Listing the files could be made easy with the dmd -r people have talked
 about (taking what rdmd does and putting it in the compiler). Then it
 does it automatically.

It would still be a hack.. while I'm looking for a fix (or a clarification that we need a hack). If it was my personal problem I'd "solve" it with: dmd ~/dmd2/phobos/std/*.d <blah> maybe even alias it like this. Hm this way I could even inline some of druntime...
 I doubt you'll see much impact on compile speed.

Agreed.
 Importing a phobos
 module is dog slow already, so it can't get much worse in any case.

And that could be improved.. once it starts going into finer-grained imports/packages. The general felling is that it'd be *soon*. -- Dmitry Olshansky
Sep 09 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 9/9/13, Adam D. Ruppe <destructionator gmail.com> wrote:
 Listing the files could be made easy with the dmd -r people have
 talked about (taking what rdmd does and putting it in the
 compiler). Then it does it automatically.

 I doubt you'll see much impact on compile speed. Importing a
 phobos module is dog slow already, so it can't get much worse in
 any case.

W.r.t -r (recursive build), it's gives you a performance boost since the compiler doesn't have to be invoked multiple times and do the same work over and over again (compared to using it from RDMD). But I've ran into a bug with that pull request, and I haven't reduced the test-case of the failure yet.
Sep 09 2013
prev sibling next sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 09/09/13 15:01, Dmitry Olshansky wrote:
 While investigating std.regex performance in Phobos I've found that a lot of
 stuff never gets inlined (contrary to my expectations).

Is that just with dmd, or with gdc and ldc as well?
Sep 09 2013
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Sep-2013 18:26, Joseph Rushton Wakeling пишет:
 On 09/09/13 15:01, Dmitry Olshansky wrote:
 While investigating std.regex performance in Phobos I've found that a
 lot of
 stuff never gets inlined (contrary to my expectations).

Is that just with dmd, or with gdc and ldc as well?

For DMD and LDC confirmed. Would be interesting to test GDC but I bet it's the same (does LTO work here btw?). On the bright side of things std.regex is real fast on LDC *when hacked* to inline the critical bits :) -- Dmitry Olshansky
Sep 09 2013
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:
 On 09/09/13 16:34, Dmitry Olshansky wrote:
 On the bright side of things std.regex is real fast on LDC *when
 hacked* to
 inline the critical bits :)

Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?

When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?
 My experience is that LDC is starting to pull ahead in the speed stakes
 these days [*], although it does seem to depend a bit on exactly what
 kind of code you're writing.

 [* Caveat: that might be due to me switching to an LLVM 3.3 backend,
 although I was starting to observe this even when I was still working
 with 3.2.]

I'm using LLVM 3.3 and fresh git clone of LDC. -- Dmitry Olshansky
Sep 09 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Sep-2013 21:42, Johannes Pfau пишет:
 On Monday, 9 September 2013 at 14:58:56 UTC, Dmitry Olshansky wrote:
 09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:
 On 09/09/13 16:34, Dmitry Olshansky wrote:
 On the bright side of things std.regex is real fast on LDC *when
 hacked* to
 inline the critical bits :)

Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?

When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?

I only know about GDC and GDC doesn't implement cross-module inlining right now. If the modules are compiled in a single run it might work but if the modules are compiled separately then only LTO (not tested with GDC though!) can help. AFAIK the problem is this: There's no high-level way to tell the backend "hey, I have the source code for this function. if you consider inlining call me back and I'll compile it for you". The only hack which could work is _always_ compiling _all_ functions from all modules. But compile times will explode.

Precisely the problem we have and the current state of things. Compiling everything would be option B). The solution sought after is not how to hack this around but how to make everything work nicely out of the box (for everybody).
 Another issue is that whether a function will be inlined depends on
 details like the number of compiled instructions. Those details are only
 available once the function is compiled, the source code is not enough.

 Maybe a reasonable compromise could be made with some help from the
 frontend. The frontent could give us some hints ("Likely inlineable").
 Then we could compile all "likely inlineable" functions and let the
 backend decide if it really wants to inline those.

 (Another options is inlining in the frontend. DMD does that right now
 but IIRC it causes problems with the GCC backend and is disabled in
 GDC). Iain can probably give a better answer here.

DMD's AST re-writing inliner is rather lame currently, hence just not worth the trouble I suspect.
 (Note: there's a low-level way to do this: LTO actually adds
 intermediate code to the object files. If the linker wants to inline a
 function, it calls the compiler to compile that intermediate code:
 http://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html . In the end
 working LTO is probably the best solution.)

LTO would be the best solution but at the moment it's rather rarely used optimization with obscure issues of its own. It makes me think that generating generic (& sensible) IR instead of object code and doing inlining of that is a cute idea.. but wait that's what LLVM analog of LTO should do. -- Dmitry Olshansky
Sep 09 2013
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2013-09-10 06:02, Jonathan M Davis wrote:

 The compiler should definitely be able to look at non-templated functions and
 inline them where appropriate. I expect that it will really hurt performance
 in general if it doesn't - especially with stuff like getters or setters. I
 don't know what the best way would be for the compiler to go about doing that
 and have no idea how the inliner currently works, but I don't think that
 there's any question that it needs to.

I agree. -- /Jacob Carlborg
Sep 10 2013
prev sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 09/10/13 12:12, Joseph Rushton Wakeling wrote:
 On 10/09/13 11:57, Artur Skawina wrote:
 It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless
 things changed in the last couple of months).

What happened to break it?

The changes to gcc lto post-4.6. http://forum.dlang.org/thread/5139CF92.4070408 gmail.com http://bugzilla.gdcproject.org/show_bug.cgi?id=61 artur
Sep 10 2013
prev sibling next sibling parent Artur Skawina <art.08.09 gmail.com> writes:
On 09/09/13 16:34, Dmitry Olshansky wrote:
 09-Sep-2013 18:26, Joseph Rushton Wakeling пишет:
 On 09/09/13 15:01, Dmitry Olshansky wrote:
 While investigating std.regex performance in Phobos I've found that a
 lot of
 stuff never gets inlined (contrary to my expectations).

Is that just with dmd, or with gdc and ldc as well?

For DMD and LDC confirmed. Would be interesting to test GDC but I bet it's the same (does LTO work here btw?).

It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless things changed in the last couple of months). So you have the choice of using an old frontend with LTO or a reasonably recent one without (with no cross-module inlining). The fact that it is effectively impossible to use both gdc versions (ie the old LTO-enabled one just for release builds) makes the situation even worse (the language accepted by gdc was changed in a backward incompatible way; pragma-gcc- -attributes became errors). artur
Sep 10 2013
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 10/09/13 11:57, Artur Skawina wrote:
 It used to, back in the gcc4.6 days. Right now gdc LTO is broken (unless
 things changed in the last couple of months).

What happened to break it?
Sep 10 2013
prev sibling next sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On 09/09/13 16:34, Dmitry Olshansky wrote:
 On the bright side of things std.regex is real fast on LDC *when hacked* to
 inline the critical bits :)

Do you mean when manually inlined, or when the design is tweaked to facilitate inlining? My experience is that LDC is starting to pull ahead in the speed stakes these days [*], although it does seem to depend a bit on exactly what kind of code you're writing. [* Caveat: that might be due to me switching to an LLVM 3.3 backend, although I was starting to observe this even when I was still working with 3.2.]
Sep 09 2013
prev sibling next sibling parent "Johannes Pfau" <nospam example.com> writes:
On Monday, 9 September 2013 at 14:58:56 UTC, Dmitry Olshansky 
wrote:
 09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:
 On 09/09/13 16:34, Dmitry Olshansky wrote:
 On the bright side of things std.regex is real fast on LDC 
 *when
 hacked* to
 inline the critical bits :)

Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?

When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?

I only know about GDC and GDC doesn't implement cross-module inlining right now. If the modules are compiled in a single run it might work but if the modules are compiled separately then only LTO (not tested with GDC though!) can help. AFAIK the problem is this: There's no high-level way to tell the backend "hey, I have the source code for this function. if you consider inlining call me back and I'll compile it for you". The only hack which could work is _always_ compiling _all_ functions from all modules. But compile times will explode. Another issue is that whether a function will be inlined depends on details like the number of compiled instructions. Those details are only available once the function is compiled, the source code is not enough. Maybe a reasonable compromise could be made with some help from the frontend. The frontent could give us some hints ("Likely inlineable"). Then we could compile all "likely inlineable" functions and let the backend decide if it really wants to inline those. (Another options is inlining in the frontend. DMD does that right now but IIRC it causes problems with the GCC backend and is disabled in GDC). Iain can probably give a better answer here. (Note: there's a low-level way to do this: LTO actually adds intermediate code to the object files. If the linker wants to inline a function, it calls the compiler to compile that intermediate code: http://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html . In the end working LTO is probably the best solution.)
Sep 09 2013
prev sibling next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Monday, 9 September 2013 at 17:42:04 UTC, Johannes Pfau wrote:
 But compile times will explode.

Are you sure? time dmd hello.d # import std.stdio; void main() { writeln("hello"); } real 0m0.665s $ time dmd hello.d d/dmd2/src/phobos/std/*.d std.md5 is scheduled for deprecation. Please use std.digest.md instead real 0m2.367s That's slow for hello world, but not a dealbreaker to me since larger projects can easily exceed that anyway (especially with optimizations turned on). And that's making no attempt to only compile the files actually imported. If we try to be smarter about it: time dmd hello.d d/dmd2/src/phobos/std/{stdio,conv,format,string,traits,typetuple,typecons,bitmanip,system,functional,utf,uni,container,random,numeric,complex,regex,stdiobase}.d real 0m1.119s It's adding about 1/2 second to the compile time - note, not really doubling it since more complex user code would take a bigger fraction of the total than phobos as the app grows - ....which really isn't awful. At least when specifically asking for -inline, I think this is worth the extra compile time.
Sep 09 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, September 09, 2013 18:58:47 Dmitry Olshansky wrote:
 09-Sep-2013 18:39, Joseph Rushton Wakeling пишет:
 On 09/09/13 16:34, Dmitry Olshansky wrote:
 On the bright side of things std.regex is real fast on LDC *when
 hacked* to
 inline the critical bits :)

Do you mean when manually inlined, or when the design is tweaked to facilitate inlining?

When I put extra () to indicate that said functions are templates. Then compiler gets its grip on them and finally inlines. Otherwise it generates calls and links in object code from libphobos. Which is the whole reason for the topic - is THAT is the way to go? Shouldn't compiler look into source for inlinable stuff (when source is available)?

The compiler should definitely be able to look at non-templated functions and inline them where appropriate. I expect that it will really hurt performance in general if it doesn't - especially with stuff like getters or setters. I don't know what the best way would be for the compiler to go about doing that and have no idea how the inliner currently works, but I don't think that there's any question that it needs to. - Jonathan M Davis
Sep 09 2013
prev sibling next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Monday, 9 September 2013 at 13:01:51 UTC, Dmitry Olshansky 
wrote:
 [...]

I'm "resurrecting" this thread, because I also noticed that "std.ascii" is "victim" to this. Almost all of the function in there are trivially inline-able, but because they are not templates, aren't. By simply making them templates, I can improve the performance of functions such as "split on ascii white" by 2 to 3 (!). This is a damn shame.
Sep 15 2013
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
15-Sep-2013 23:05, Andrej Mitrovic пишет:
 On 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:
 By simply making them templates, I can improve the performance of
 functions such as "split on ascii white" by 2 to 3 (!).


rooted there. I'm of the opinion that the user must not suffer because of a undecided situation with inlining in the toolchain (all of them).
 Speaking of which, I think the following special case should be allowed:

 -----
 void foo()() { }

 void main()
 {
      auto x = &foo;  // NG
 }
 -----

 Then maybe we won't even break anyone's code.

Providing either this special case for empty argument templates seems to be a small price to help this ugly situation. That is unless compiler devs agree with the following observation and see a way to get there in short-term:
 I *thought* that the intended behavior is:
 a) Have source - compile from source
 b) Don't have source (*.di files) - link in objects

Which is something nobody clarified yet. Well Johannes spoke for GDC by noting that there is no notion to support that in the current frontend-backend dialog. -- Dmitry Olshansky
Sep 15 2013
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
16-Sep-2013 01:51, Dmitry Olshansky пишет:
 15-Sep-2013 23:05, Andrej Mitrovic пишет:
 On 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:
 By simply making them templates, I can improve the performance of
 functions such as "split on ascii white" by 2 to 3 (!).


rooted there.

For the benefit of these who have followed this thread... All is not lost - we have Kenji! Relevant Pull: https://github.com/D-Programming-Language/dmd/pull/2561 -- Dmitry Olshansky
Sep 18 2013
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 9/15/13, monarch_dodra <monarchdodra gmail.com> wrote:
 By simply making them templates, I can improve the performance of
 functions such as "split on ascii white" by 2 to 3 (!).

Speaking of which, I think the following special case should be allowed: ----- void foo()() { } void main() { auto x = &foo; // NG } ----- Then maybe we won't even break anyone's code.
Sep 15 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Wednesday, 18 September 2013 at 17:23:04 UTC, Dmitry Olshansky 
wrote:
 All is not lost - we have Kenji!

Lol, makes me think of Captain Planet. "With the five powers combined they summon D's greatest champion - Kenji Hara!" - Compilers! - Assembly! - Bug fixes! - Standard Libraries! - Linkers! - By your powers combined, I am Kenji Hara! - (All) Go Kenji!
Sep 18 2013
prev sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 18 September 2013 19:06, monarch_dodra <monarchdodra gmail.com> wrote:
 On Wednesday, 18 September 2013 at 17:23:04 UTC, Dmitry Olshansky wrote:
 All is not lost - we have Kenji!

Lol, makes me think of Captain Planet. "With the five powers combined they summon D's greatest champion - Kenji Hara!" - Compilers! - Assembly! - Bug fixes! - Standard Libraries! - Linkers! - By your powers combined, I am Kenji Hara! - (All) Go Kenji!

Go! err... I meant D! -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Sep 18 2013