digitalmars.D.announce - Silicon Valley D Meetup - December 14, 2017

digitalmars.D.announce - Silicon Valley D Meetup - December 14, 2017 - "Experimenting with

=?UTF-8?Q?Ali_=c3=87ehreli?= (11/11) Nov 21 2017 Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/2452882...

=?UTF-8?Q?Ali_=c3=87ehreli?= (2/19) Dec 10 2017
=?UTF-8?Q?Ali_=c3=87ehreli?= (4/21) Dec 14 2017 This should be live now:

Jon Degenhardt (3/27) Dec 15 2017 Slides from the talk:
Johan Engelen (78/93) Dec 16 2017 Great! I've added some comments there, pasted here:

Jon Degenhardt (11/21) Dec 16 2017 Fantastic feedback! Fills in some really important details.

Johan Engelen (7/13) Dec 24 2017 Don't have performance numbers yet. But the executable size of

Jon Degenhardt (13/14) Dec 20 2017 Early returns on adding PGO on top of LTO (first five benchmarks

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/

LDC[1], the LLVM-based D compiler, has been adding Link Time 
Optimization capabilities over the last several releases. [...]

This talk will look at the results of applying LTO to one set of 
applications, eBay's TSV utilities[2]. [...]

Jon Degenhardt is a member of eBay's Search Science team.
[...] D quickly became his favorite programming language, one he uses 
whenever he can.

Ali

[1] https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler

[2] https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

Nov 21 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Reminder...

On 11/21/2017 11:58 AM, Ali Çehreli wrote:
 Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/
 
 LDC[1], the LLVM-based D compiler, has been adding Link Time 
 Optimization capabilities over the last several releases. [...]
 
 This talk will look at the results of applying LTO to one set of 
 applications, eBay's TSV utilities[2]. [...]
 
 Jon Degenhardt is a member of eBay's Search Science team.
 [...] D quickly became his favorite programming language, one he uses 
 whenever he can.
 
 Ali
 
 [1] https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler
 
 [2] https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

Dec 10 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

This should be live now:

   http://youtu.be/e05QvoKy_8k

Ali

On 11/21/2017 11:58 AM, Ali Çehreli wrote:

 Meetup page: https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/
 
 LDC[1], the LLVM-based D compiler, has been adding Link Time 
 Optimization capabilities over the last several releases. [...]
 
 This talk will look at the results of applying LTO to one set of 
 applications, eBay's TSV utilities[2]. [...]
 
 Jon Degenhardt is a member of eBay's Search Science team.
 [...] D quickly became his favorite programming language, one he uses 
 whenever he can.
 
 Ali
 
 [1] https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler
 
 [2] https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

Dec 14 2017

Jon Degenhardt <jond noreply.com> writes:

On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote:
 This should be live now:

   http://youtu.be/e05QvoKy_8k

 Ali

 On 11/21/2017 11:58 AM, Ali Çehreli wrote:

 Meetup page: 
 https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/
 
 LDC[1], the LLVM-based D compiler, has been adding Link Time 
 Optimization capabilities over the last several releases. [...]
 
 This talk will look at the results of applying LTO to one set 
 of applications, eBay's TSV utilities[2]. [...]
 
 Jon Degenhardt is a member of eBay's Search Science team.
 [...] D quickly became his favorite programming language, one 
 he uses whenever he can.
 
 Ali
 
 [1] 
 https://github.com/ldc-developers/ldc#ldc--the-llvm-based-d-compiler
 
 [2] 
 https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/


Slides from the talk: 
https://github.com/eBay/tsv-utils-dlang/blob/master/docs/dlang-meetup-14dec2017.pdf

Dec 15 2017

Johan Engelen <j j.nl> writes:

 On 11/21/2017 11:58 AM, Ali Çehreli wrote:

 Meetup page: 
 https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/
 
 LDC[1], the LLVM-based D compiler, has been adding Link Time 
 Optimization capabilities over the last several releases. [...]
 
 This talk will look at the results of applying LTO to one set 
 of applications, eBay's TSV utilities[2]. [...]
 
 Jon Degenhardt is a member of eBay's Search Science team.
 [...] D quickly became his favorite programming language, one 
 he uses whenever he can.


On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote:
 This should be live now:

   http://youtu.be/e05QvoKy_8k

Great! I've added some comments there, pasted here:

Jon, thanks for the extensive talk and testing on LTO!
And thanks for recording / broadcasting :-)

(times are approximate)

7:45  Full vs Thin LTO further clarification: Full LTO is single 
threaded optimization and codegen (comparable with putting all 
source in one module). Thin LTO loads each module separately and 
imports functions it needs from other modules, then after the 
optimization and codegen happen in parallel for each module (and 
normal linking happens afterwards). LTO's capabilities stem from 
having access to functions' source code of other modules, and 
knowing which functions are internal to the program (so that they 
can be removed, non-ABI-conformant calling convention, etc., also 
discussed around 41:30); the importing+optim that happens at the 
start of Thin LTO gives you that, with the added advantage of 
parallel optim+codegen afterwards.

14:00  If the question was: do you need all libraries to be in 
IR: no. LTO works with mixed IR-object files and normal object 
files and libraries. Even if linking with non-IR libraries, it 
helps to know that no other object file references a symbol (so 
you can internalize it and generate better code). But indeed, for 
_much_ better optimization potential: the more source you have 
compiled with LTO enabled the better.

15:30  Whole source optimization at D-level has indeed higher 
potential; at the moment I don't think we do many optimizations 
that are only possible at D-level (and so they are done at IR 
level; or not at all... I'm working e.g. on devirtualization). 
Extra remark: the first step towards that is much deeper and 
well-defined spec of D semantics, in abstract machine terms.

15:45  Testing == contributing! And you're testing has greatly 
improved LDC's LTO, thanks!

15:50  The ldc-build-runtime tool was made by Martin Kinkelin, 
and as you mention it is the enabler for most of your work.

16:15  LDC LTO Windows == integrating LLD into LDC (or using 
lld-link.exe), https://github.com/ldc-developers/ldc/issues/2028

~30:00 IIRC, the performance regression is due to cross-module 
inlining/optim (as you mention), which we get for free with LTO 
:-)   (that is not to say that we wouldn't like to do 
cross-module inlining without LTO)

33:20  Compilation time. LTO skips machine codegen during the 
normal compilation, as machine codegen is done in the LTO linking 
step. So the slowdown with Thin LTO may not be too much (Thin LTO 
being a parallel build). An extreme case where LTO may actually 
result in faster codegen: if you have 1 million template function 
instantiations in CTFE, but they are not called during runtime, 
LTO may easily discard them before they reach the optimization 
and machine codegen stage. In such a case, LTO may very well be 
faster (optimized machine codegen is time consuming); however, 
the IR does have to be created and written to disk, and then read 
from disk, that takes time too... Overall, Thin LTO is slower 
than a normal `-O3` build, but only by a small ratio, but it also 
does more work (the added optimization). The compile speed 
difference between Full LTO and Thin LTO is very large (Full LTO 
is several times slower).

39:40  Indeed, D doesn't require codegen of templates if we can 
prove that it is already codegenned in the library itself: i.e. 
you _have_ to _link_ with a template-only library. In C++, 
codegen of templates is mandatory (afaik), and thus you do not 
have to link with a template-only library (e.g. headers files 
only). In D, this culling of template codegen is done to increase 
compile speed; in that sense not a fair comparison with C++. For 
cross-module inlining / inlining of templated functions: in C++ 
all template code is available in each codegenned module, so LTO 
is not needed to improve things; in D, using LTO makes template 
code available that otherwise wouldn't ---> larger (potentially 
much larger) relative gains with LTO for D.  (this is somewhat 
particular to LDC currently; GDC does better cross-module 
inlining; try LDC's `-enable-cross-module-inlining`)

56:40 Fully share your thinking that cross-module inlining is the 
main source of performance gains

Can't wait to see the results of LTO on Weka.io's (LARGE) 
applications. Work in progress...!

Could you add the reference links in the comment section there 
too? (can't click on blue links in the video ;-)

Clearly very interested in what your PGO testing will show. :-)

Cheers,
   Johan

Dec 16 2017

Jon Degenhardt <jond noreply.com> writes:

On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen 
wrote:
 On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote:
 This should be live now:

   http://youtu.be/e05QvoKy_8k

 Great! I've added some comments there, pasted here:

Fantastic feedback! Fills in some really important details.

 Can't wait to see the results of LTO on Weka.io's (LARGE) 
 applications. Work in progress...!

Agreed. It'd be great to see the experience of a few more apps.

 Could you add the reference links in the comment section there 
 too? (can't click on blue links in the video ;-)

Done. Thanks for pointing this out. I also updated the posted 
slide deck so that the hyperlinks work after downloading it. 
(They still aren't clickable in the GitHub inline viewer.)

 Clearly very interested in what your PGO testing will show. :-)

Yes, should be interesting. Promising results in one benchmark. 
And sigh, I forgot to mention the opportunity you mentioned for 
someone to participate: Adding LLVM's IR-level PGO to the LDC 
compiler. Sounds pretty cool.

Dec 16 2017

Johan Engelen <j j.nl> writes:

On Saturday, 16 December 2017 at 19:40:14 UTC, Jon Degenhardt 
wrote:
 On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen 
 wrote:
 Can't wait to see the results of LTO on Weka.io's (LARGE) 
 applications. Work in progress...!

 Agreed. It'd be great to see the experience of a few more apps.

Don't have performance numbers yet. But the executable size of 
the release build with ThinLTO drops to a third of the non-LTO 
executable size...!!! 809 MB to 250 MB.
(That's without an LTO build of druntime/Phobos.)

-Johan

Dec 24 2017

Jon Degenhardt <jond noreply.com> writes:

On Saturday, 16 December 2017 at 11:52:37 UTC, Johan Engelen 
wrote:
 Clearly very interested in what your PGO testing will show. :-)

Early returns on adding PGO on top of LTO (first five benchmarks 
in the slide deck, tsv-join not tested):
* Two meaningful improvements:
   - csv2tsv: Linux: 8%; macOS: 33%
   - tsv-summarize: Linux: 6%; macOS: 11%
* Minor improvements on the other three benchmarks (< 5%)

Overall, for LDC 1.5, the improvements going from a normal 
optimized build to one combining LTO and PGO ranged from on 8-45% 
Linux, and 6-57% on macOS. (First five benchmarks, excluding 
tsv-join). Impressive!

--Jon

Dec 20 2017

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Silicon Valley D Meetup - December 14, 2017 - "Experimenting with