digitalmars.D - AST files instead of DI interface files for faster compilation and

timotheecour (67/67) Jun 12 2012 There's a current pull request to improve di file generation

Tobias Pankrath (2/2) Jun 12 2012 Currently .di-files are compiler independent. If this should hold

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (7/9) Jun 12 2012 Which is a Good Thing (TM). It would /require/ formalization of the

Timon Gehr (2/8) Jun 12 2012 I do not see how this conclusion could be reached.

deadalnix (7/9) Jun 12 2012 We need it anyway at some point. AST macro is another example.

Timon Gehr (10/21) Jun 12 2012 AST macros may refer to AST structures by their representations as D cod...

Don Clugston (14/24) Jun 12 2012 Is that actually true? My recollection is that the original motivation

foobar (17/47) Jun 12 2012 I absolutely agree with the above and would also add that goal

Dmitry Olshansky (13/60) Jun 12 2012 Absolutely. DDoc being built-in didn't sound right to me at first, BUT

Adam Wilson (28/90) Jun 12 2012 I completely agree with this. The interactions between the D module syst...

Dmitry Olshansky (7/91) Jun 12 2012 I/O is not. (De)Compression on the fly is more and more intersecting

Paulo Pinto (14/79) Jun 13 2012 Back in the 90's I only moved 100% away from Turbo Pascal into C

Jacob Carlborg (4/9) Jun 12 2012 Can't the same be done with OMF? I'm not saying I want to keep OMF.

Adam Wilson (10/17) Jun 12 2012 OMF doesn't support Custom Sections and I think a custom section is the ...

foobar (17/47) Jun 12 2012 I absolutely agree with the above and would also add that goal

deadalnix (3/7) Jun 12 2012 LLVM is definitively something I look at more and more. It is a great

Walter Bright (4/12) Jun 12 2012 (4) was not a goal.

Don Clugston (8/25) Jun 13 2012 I don't understand (1) actually.

Iain Buclaw (21/48) Jun 13 2012 Lexing and Parsing are miniscule tasks in comparison to the three

Dmitry Olshansky (5/53) Jun 13 2012 Is time spent on I/O accounted for in the parse step? And where is the

Iain Buclaw (12/80) Jun 13 2012 k

Dmitry Olshansky (6/81) Jun 13 2012 Ok, then parsing is indistinguishable from I/O and together are only

deadalnix (4/53) Jun 13 2012 Nice numbers ! It also show that the slowest part is the backend.
Kagamin (10/10) Jun 13 2012 The measurements should be done for modules being imported, not

Kagamin (2/12) Jun 13 2012 Oh and let it import .d files, not .di

Iain Buclaw (8/22) Jun 13 2012 std.datetime is one reason for me to run it again. I can imagine that

Kagamin (3/10) Jun 13 2012 Probably. Also test with -fsyntax-only is it works and runs
Jacob Carlborg (7/11) Jun 13 2012 You should try the Objective-C/D bridge, that took quite a while to

Iain Buclaw (25/46) Jun 16 2012 Rebuilt a compile log with latest gdc as of writing on the 2.059

deadalnix (2/47) Jun 19 2012 Thank you very much for your work.

Iain Buclaw (30/75) Jun 16 2012 tl;dr

Guillaume Chatelet (5/7) Jun 16 2012 So maybe my post about "keeping import clean" wasn't as irrelevant as I

Iain Buclaw (11/18) Jun 19 2012 I think it's relevancy is only geared towards projects that are

Walter Bright (4/31) Jun 13 2012 Yes, it is designed so you could just import a symbol table. It is done ...

Don Clugston (6/42) Jun 14 2012 Iain's data indicates that it's only a few % of the time taken on

Jonathan M Davis (7/44) Jun 14 2012 If this is the case, is there any value at all to using .di files in dru...

Kagamin (4/8) Jun 14 2012 Oh, right, the module can use mixins and CTFE, so it should be
Don Clugston (15/59) Jun 14 2012 I don't think Phobos should use .di files at all. I don't think there

Jonathan M Davis (9/27) Jun 15 2012 On several occasions, Walter has expressed the desire to make Phobos use...
Walter Bright (7/11) Jun 16 2012 The language is carefully designed, so that at least in theory all the p...

deadalnix (3/17) Jun 19 2012 The key point is project size here. I wouldn't expect file size to

Walter Bright (3/9) Jun 16 2012 I don't think they're nasty or are side effects.

Don Clugston (7/18) Jun 18 2012 But you argued in your blog that C++ parsing is inherently slow, and

Walter Bright (4/16) Jun 18 2012 Yeah, but I can't escape that lingering feeling that lexing is slow.

Daniel (4/24) Jun 18 2012 Same here, I wish there were a standardized pre-lexed-token

Chris Cain (9/12) Jun 18 2012 If I were to make my own language, I'd forego a human-readable

Timon Gehr (6/17) Jun 18 2012 This could be done even if the language's source code storage format is

Kagamin (2/4) Jun 19 2012 Yep, pegged runs at compile time.

Kagamin (5/9) Jun 19 2012 I don't even understand all this rage about asynchronicity, if

dennis luehring (8/18) Jun 19 2012 the lexing and parsing process can be asynchron - i will be faster on

dennis luehring (4/21) Jun 19 2012 so you started you lexing, parsing in seperated threads for each file -
deadalnix (2/20) Jun 19 2012 It is kind of religious. We need data.
Steven Schveighoffer (8/26) Jun 25 2012 I have found that my project, which has a huge number of symbols (And
Martin Nowak (23/41) Jun 25 2012 Lexing is definitely taking a big part of debug compilation time.

"timotheecour" <thelastmammoth gmail.com> writes:

There's a current pull request to improve di file generation 
(https://github.com/D-Programming-Language/dmd/pull/945); I'd 
like to suggest further ideas.
As far as I understand, di interface files try to achieve these 
conflicting goals:

1) speed up compilation by avoiding having to reparse large files 
over and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and 
CTFE
4) be human readable

-Goals 2) and 3) are clearly contradictory, so that calls for a 
command line switch (eg -hidesource), which should be off by 
default, which when set will indeed remove any implementation 
details (where possible, ie for non-template and non-auto-return 
functions) but as a counterpart also prevent any chance for 
inlining/CTFE for the corresponding exported API. That choice 
will be left to the user.

-Regarding point 1), it won't be untypical to have a D interface 
file to be almost as large (and slow to parse) as the original 
source file, even with the upcoming di file improvements 
(dmd/pull/945), as D encourages the use of templates/auto-return 
throughout (a large part of phobos would be left 
quasi-unchanged). In fact, the fast compile time of D _does_ 
suffer when there are heavy use of templates, or scaling up.

So to make interface files really useful in terms of speeding up 
compilation, why not directly store the AST (could be text-based 
like JSON but preferably a portable binary format for speed, call 
it ".dib" file), with possibly some amount of analysis (eg: 
version(windows) could be pre-handled). This would be analoguous 
to precompiled header files 
(http://en.wikipedia.org/wiki/Precompiled_header), which don't 
exist in D AFAIK. This could be done by extending the currently 
incomplete json file generation by dmd, to include AST of 
implementation of each function we want to export such as 
templates or stuff to inline). During compilation of a module, 
"import myfun;" would look for 1) myfun.dib (binary or json 
precompiled interface file), 2) myfun.di (if still needed), 3) 
myfun.d.



We could even go a step further, borrowing some ideas from the 
"framework" feature found in OSX to distribute components: a 
single D framework would combine the AST (~ precompiled .dib 
headers) of a set of D modules and a set of libraries.
The user would then use a framework as follows:

     dmd -L-framework mylib -L-Lpath/to/mylib main.d

or simply:

     dmd main.d

if main.d contains pragma(framework,"mylib") and framework mylib 
is in the search path

As in OSX's frameworks, framework mylib is used both during 
compilation (resolving import statements in main.d) and linking. 
Upon encountering an "import myfun;" declaration, the compiler 
would search the linked in frameworks for a symbol or file 
representing the corresponding AST of module myfun, and if not 
found, use the default import mechanism.
That will both speed up compilation times and make distribution 
of libraries and versioning a breeze: single framework to 
download and to link against (this is different from what rdmd 
does). On OSX, frameworks appear as a single file in Finder but 
are actually directories; here we could have either a single file 
or a directory as well.

Finally, regarding point 4), a simple command line switch (eg dmd 
--pretty-print myfun.di) will pretty-print to stdout the AST, and 
omit the implementation of templates and auto functions for 
brevity, so they appear as simple di files (but some options 
could filter out AST nodes for IDE use, etc).

Thanks for your comments!

Jun 12 2012

"Tobias Pankrath" <tobias pankrath.net> writes:

Currently .di-files are compiler independent. If this should hold 
for dib-files, too, we'll need a standard ast structure, won't we?

Jun 12 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 12-06-2012 12:23, Tobias Pankrath wrote:
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

Which is a Good Thing (TM). It would /require/ formalization of the 
language once and for all.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Jun 12 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 06/12/2012 12:47 PM, Alex Rønne Petersen wrote:
 On 12-06-2012 12:23, Tobias Pankrath wrote:
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

 Which is a Good Thing (TM). It would /require/ formalization of the
 language once and for all.

I do not see how this conclusion could be reached.

Jun 12 2012

deadalnix <deadalnix gmail.com> writes:

Le 12/06/2012 12:23, Tobias Pankrath a écrit :
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

We need it anyway at some point. AST macro is another example.

It would also greatly simplify compiler writing if the D interpreter 
could be provided as lib (and so run on top of dib file).

I want to mention that LLVM IR + metadata can do a really good job here. 
In addition, LLVM people are working on a JIT backend, if you know what 
I mean ;)

Jun 12 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 06/12/2012 03:54 PM, deadalnix wrote:
 Le 12/06/2012 12:23, Tobias Pankrath a écrit :
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

 We need it anyway at some point.

Plain D code is already a perfectly fine standard AST structure.

  AST macro is another example.

AST macros may refer to AST structures by their representations as D code.

 It would also greatly simplify compiler writing if the D interpreter
 could be provided as lib (and so run on top of dib file).

I don't think so. Writing the interpreter is a rather straightforward 
part of the compiler implementation. Why would you want to run it on top 
of a '.dib' file anyway? Serializing/deserializing the AST is too much 
overhead.

 I want to mention that LLVM IR + metadata can do a really good job here.
 In addition, LLVM people are working on a JIT backend, if you know what
 I mean ;)

Interpreting manually is not harder than CTFE-compatible LLVM IR code 
generation, but the LLVM JIT could certainly be leveraged to improve 
compilation speeds.

Jun 12 2012

Don Clugston <dac nospam.com> writes:

On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

Is that actually true? My recollection is that the original motivation 
was only goal (2), but I was fairly new to D at the time (2005).

Here's the original post where it was implemented:
http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in 
DMD0.142

Personally I believe that.di files are *totally* the wrong approach for 
goal (1). I don't think goal (1) and (2) have anything in common at all 
with each other, except that C tried to achieve both of them using 
header files. It's an OK solution for (1) in C, it's a failure in C++, 
and a complete failure in D.

IMHO: If we want goal (1), we should try to achieve goal (1), and stop 
pretending its in any way related to goal (2).

Jun 12 2012

"foobar" <foo bar.com> writes:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd 
 like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original 
 motivation was only goal (2), but I was fairly new to D at the 
 time (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), first 
 usable in DMD0.142

 Personally I believe that.di files are *totally* the wrong 
 approach for goal (1). I don't think goal (1) and (2) have 
 anything in common at all with each other, except that C tried 
 to achieve both of them using header files. It's an OK solution 
 for (1) in C, it's a failure in C++, and a complete failure in 
 D.

 IMHO: If we want goal (1), we should try to achieve goal (1), 
 and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal 
(4) is an anti-feature. In order to get a human readable version 
of the API the programmer should use *documentation*. D claims 
that one of its goals is to make it a breeze to provide 
documentation by bundling a standard tool - DDoc. There's no need 
to duplicate this just to provide another format when DDoc itself 
supposed to be format agnostic.

This is a solved problem since the 80's (E.g. Pascal units). Per 
Adam's post, the issue is tied to DMD's use of OMF/optlink which 
we all would like to get rid of anyway. Once we're in proper COFF 
land, couldn't we just store the required metadata (binary AST?) 
in special sections in the object files themselves?

Another related question - AFAIK the LLVM folks did/are doing 
work to make their implementation less platform-depended. Could 
we leverage this in ldc to store LLVM bit code as D libs which 
still retain enough info for the compiler to replace header files?

Jun 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original motivation
 was only goal (2), but I was fairly new to D at the time (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
 in DMD0.142

 Personally I believe that.di files are *totally* the wrong approach
 for goal (1). I don't think goal (1) and (2) have anything in common
 at all with each other, except that C tried to achieve both of them
 using header files. It's an OK solution for (1) in C, it's a failure
 in C++, and a complete failure in D.

 IMHO: If we want goal (1), we should try to achieve goal (1), and stop
 pretending its in any way related to goal (2).

 I absolutely agree with the above and would also add that goal (4) is an
 anti-feature. In order to get a human readable version of the API the
 programmer should use *documentation*. D claims that one of its goals is
 to make it a breeze to provide documentation by bundling a standard tool
 - DDoc. There's no need to duplicate this just to provide another format
 when DDoc itself supposed to be format agnostic.

Absolutely. DDoc being built-in didn't sound right to me at first, BUT 
it allows us to essentially being able to say that APIs are covered in 
the DDoc generated files. Not header files etc.

 This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication 
of a simple fact: people would like to think & work in modules rather 
then seeing guts of old and crappy OBJ file technology. Linking with C 
!= using C tools everywhere.

Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

Seconded. At least lexed form could be very compact, I recall early 
compressors tried doing the Huffman thing on source code tokens with a 
certain success.

 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?


-- 
Dmitry Olshansky

Jun 12 2012

"Adam Wilson" <flyboynw gmail.com> writes:

On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original motivation
 was only goal (2), but I was fairly new to D at the time (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
 in DMD0.142

 Personally I believe that.di files are *totally* the wrong approach
 for goal (1). I don't think goal (1) and (2) have anything in common
 at all with each other, except that C tried to achieve both of them
 using header files. It's an OK solution for (1) in C, it's a failure
 in C++, and a complete failure in D.

 IMHO: If we want goal (1), we should try to achieve goal (1), and stop
 pretending its in any way related to goal (2).

 I absolutely agree with the above and would also add that goal (4) is an
 anti-feature. In order to get a human readable version of the API the
 programmer should use *documentation*. D claims that one of its goals is
 to make it a breeze to provide documentation by bundling a standard tool
 - DDoc. There's no need to duplicate this just to provide another format
 when DDoc itself supposed to be format agnostic.

 Absolutely. DDoc being built-in didn't sound right to me at first, BUT  
 it allows us to essentially being able to say that APIs are covered in  
 the DDoc generated files. Not header files etc.

 This is a solved problem since the 80's (E.g. Pascal units).

 Right, seeing yet another newbie hit it everyday is a clear indication  
 of a simple fact: people would like to think & work in modules rather  
 then seeing guts of old and crappy OBJ file technology. Linking with C  
 != using C tools everywhere.

I completely agree with this. The interactions between the D module system  
and D toolchain are utterly confusing to newcomers, especially those from  
other C-like languages. There are better ways, see .NET Assemblies and  
Pascal Units. These problems were solved decades ago. Why are we still  
using 40-year-old paradigms?

  >Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

 Seconded. At least lexed form could be very compact, I recall early  
 compressors tried doing the Huffman thing on source code tokens with a  
 certain success.

I don't see the value of compression. Lexing would already reduce the size  
significantly and compression would only add to processing times. Disk is  
cheap.

Beyond that though, this is absolutely the direction D must head in. In my  
mind the DI generation patch was mostly just a stop-gap to bring DI-gen  
up-to-date with the current system thereby giving us enough time to tackle  
the (admittedly huge) task of building COFF into the backend, emitting the  
lexed source into a special section and then giving the compiler *AND*  
linker the ability to read out the source. For example the giving the  
linker the ability to read out source code essentially requires a  
brand-new linker. Although, it is my personal opinion that the linker  
should be integrated with the compiler and done as one step, this way the  
linker could have intimate knowledge of the source and would enable some  
spectacular LTO options. If only DMD were written in D, then we could  
really open the compile speed throttles with an MT build model...

 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?



-- 
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/

Jun 12 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 12.06.2012 22:47, Adam Wilson wrote:
 On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original motivation
 was only goal (2), but I was fairly new to D at the time (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), first usable
 in DMD0.142

 Personally I believe that.di files are *totally* the wrong approach
 for goal (1). I don't think goal (1) and (2) have anything in common
 at all with each other, except that C tried to achieve both of them
 using header files. It's an OK solution for (1) in C, it's a failure
 in C++, and a complete failure in D.

 IMHO: If we want goal (1), we should try to achieve goal (1), and stop
 pretending its in any way related to goal (2).

 I absolutely agree with the above and would also add that goal (4) is an
 anti-feature. In order to get a human readable version of the API the
 programmer should use *documentation*. D claims that one of its goals is
 to make it a breeze to provide documentation by bundling a standard tool
 - DDoc. There's no need to duplicate this just to provide another format
 when DDoc itself supposed to be format agnostic.

 Absolutely. DDoc being built-in didn't sound right to me at first, BUT
 it allows us to essentially being able to say that APIs are covered in
 the DDoc generated files. Not header files etc.

 This is a solved problem since the 80's (E.g. Pascal units).

 Right, seeing yet another newbie hit it everyday is a clear indication
 of a simple fact: people would like to think & work in modules rather
 then seeing guts of old and crappy OBJ file technology. Linking with C
 != using C tools everywhere.

 I completely agree with this. The interactions between the D module
 system and D toolchain are utterly confusing to newcomers, especially
 those from other C-like languages. There are better ways, see .NET
 Assemblies and Pascal Units. These problems were solved decades ago. Why
 are we still using 40-year-old paradigms?

Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

 Seconded. At least lexed form could be very compact, I recall early
 compressors tried doing the Huffman thing on source code tokens with a
 certain success.

 I don't see the value of compression. Lexing would already reduce the
 size significantly and compression would only add to processing times.
 Disk is cheap.

I/O is not. (De)Compression on the fly is more and more intersecting 
direction these days. The less you read/write the faster you get. 
Knowing beforehand the distribution of keywords relative frequency is a 
boon. Yet I agree that it's premature at the moment.

 Beyond that though, this is absolutely the direction D must head in. In
 my mind the DI generation patch was mostly just a stop-gap to bring
 DI-gen up-to-date with the current system thereby giving us enough time
 to tackle the (admittedly huge) task of building COFF into the backend,
 emitting the lexed source into a special section and then giving the
 compiler *AND* linker the ability to read out the source. For example
 the giving the linker the ability to read out source code essentially
 requires a brand-new linker. Although, it is my personal opinion that
 the linker should be integrated with the compiler and done as one step,
 this way the linker could have intimate knowledge of the source and
 would enable some spectacular LTO options. If only DMD were written in
 D, then we could really open the compile speed throttles with an MT
 build model...


-- 
Dmitry Olshansky

Jun 12 2012

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 12 June 2012 at 12:23:21 UTC, Dmitry Olshansky wrote:
 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); 
 I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve 
 these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original 
 motivation
 was only goal (2), but I was fairly new to D at the time 
 (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), 
 first usable
 in DMD0.142

 Personally I believe that.di files are *totally* the wrong 
 approach
 for goal (1). I don't think goal (1) and (2) have anything in 
 common
 at all with each other, except that C tried to achieve both 
 of them
 using header files. It's an OK solution for (1) in C, it's a 
 failure
 in C++, and a complete failure in D.

 IMHO: If we want goal (1), we should try to achieve goal (1), 
 and stop
 pretending its in any way related to goal (2).

 I absolutely agree with the above and would also add that goal 
 (4) is an
 anti-feature. In order to get a human readable version of the 
 API the
 programmer should use *documentation*. D claims that one of 
 its goals is
 to make it a breeze to provide documentation by bundling a 
 standard tool
 - DDoc. There's no need to duplicate this just to provide 
 another format
 when DDoc itself supposed to be format agnostic.

 Absolutely. DDoc being built-in didn't sound right to me at 
 first, BUT it allows us to essentially being able to say that 
 APIs are covered in the DDoc generated files. Not header files 
 etc.

 This is a solved problem since the 80's (E.g. Pascal units).

 Right, seeing yet another newbie hit it everyday is a clear 
 indication of a simple fact: people would like to think & work 
 in modules rather then seeing guts of old and crappy OBJ file 
 technology. Linking with C != using C tools everywhere.

Back in the 90's I only moved 100% away from Turbo Pascal into C
land, when I started using Linux at the University and eventually
spent some time doing C++ as well.

It still baffles me, that in 2012 we still need to rely in crappy
C linker tooling, when in the 80's we already had languages with 
proper
modules.

Now we have many mainstream languages with proper modules, but 
many
of them leave in VM land.

Oberon, Go and Delphi/Free Pascal seem to be the only languages 
with native code generation compilers that offer the binary only 
modules solution, while many rely on some form of .di files.

Jun 13 2012

Jacob Carlborg <doob me.com> writes:

On 2012-06-12 14:09, foobar wrote:

 This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

Can't the same be done with OMF? I'm not saying I want to keep OMF.

-- 
/Jacob Carlborg

Jun 12 2012

"Adam Wilson" <flyboynw gmail.com> writes:

On Tue, 12 Jun 2012 06:46:44 -0700, Jacob Carlborg <doob me.com> wrote:

 On 2012-06-12 14:09, foobar wrote:

 This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

 Can't the same be done with OMF? I'm not saying I want to keep OMF.

OMF doesn't support Custom Sections and I think a custom section is the  
right way to handle this. I found the Borland OMF docs once a while back  
to verify this.

-- 
Adam Wilson
IRC: LightBender
Project Coordinator
The Horizon Project
http://www.thehorizonproject.org/

Jun 12 2012

"foobar" <foo bar.com> writes:

On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd 
 like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 and CTFE
 4) be human readable

 Is that actually true? My recollection is that the original 
 motivation was only goal (2), but I was fairly new to D at the 
 time (2005).

 Here's the original post where it was implemented:
 http://www.digitalmars.com/d/archives/digitalmars/D/29883.html
 and it got partially merged into DMD 0.141 (Dec 4 2005), first 
 usable in DMD0.142

 Personally I believe that.di files are *totally* the wrong 
 approach for goal (1). I don't think goal (1) and (2) have 
 anything in common at all with each other, except that C tried 
 to achieve both of them using header files. It's an OK solution 
 for (1) in C, it's a failure in C++, and a complete failure in 
 D.

 IMHO: If we want goal (1), we should try to achieve goal (1), 
 and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal 
(4) is an anti-feature. In order to get a human readable version 
of the API the programmer should use *documentation*. D claims 
that one of its goals is to make it a breeze to provide 
documentation by bundling a standard tool - DDoc. There's no need 
to duplicate this just to provide another format when DDoc itself 
supposed to be format agnostic.

This is a solved problem since the 80's (E.g. Pascal units). Per 
Adam's post, the issue is tied to DMD's use of OMF/optlink which 
we all would like to get rid of anyway. Once we're in proper COFF 
land, couldn't we just store the required metadata (binary AST?) 
in special sections in the object files themselves?

Another related question - AFAIK the LLVM folks did/are doing 
work to make their implementation less platform-depended. Could 
we leverage this in ldc to store LLVM bit code as D libs which 
still retain enough info for the compiler to replace header files?

Jun 12 2012

deadalnix <deadalnix gmail.com> writes:

Le 12/06/2012 14:39, foobar a écrit :
 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?

LLVM is definitively something I look at more and more. It is a great 
weapon for D IMO.

Jun 12 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest
 further ideas.
 As far as I understand, di interface files try to achieve these conflicting
goals:

 1) speed up compilation by avoiding having to reparse large files over and
over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal.

A .di file could very well be a binary file, but making it look like D source 
enabled them to be loaded with no additional implementation work in the
compiler.

Jun 12 2012

Don Clugston <dac nospam.com> writes:

On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.

I don't understand (1) actually.

For two reasons:
(a) Is lexing + parsing really a significant part of the compilation 
time? Has anyone done some solid profiling?

(b) Wasn't one of the goals of D's module system supposed to be that you 
could import a symbol table? Why not just implement that? Seems like 
that would be much faster than .di files can ever be.

Jun 13 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 13 June 2012 09:07, Don Clugston <dac nospam.com> wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable


 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.


 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation time?
 Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three
semantic runs done on the code.

I added speed counters into the glue code of GDC some time ago.
http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

And here is the relavent report to go with it.
http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


Example: std/xml.d
Module::parse : 0.01 ( 0%)
Module::semantic : 0.50 ( 9%)
Module::semantic2 : 0.02 ( 0%)
Module::semantic3 : 0.04 ( 1%)
Module::genobjfile : 0.10 ( 2%)

For the entire time it took to compile the one file (5.22 seconds) -
it spent almost 10% of it's time running the first semantic analysis.


But that was the D2 frontend / phobos as of September 2010.  I should
re-run a report on updated times and draw some comparisons. :~)


Regards
-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Jun 13 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>  wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable


 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.


 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation time?
 Has anyone done some solid profiling?

 Lexing and Parsing are miniscule tasks in comparison to the three
 semantic runs done on the code.

 I added speed counters into the glue code of GDC some time ago.
 http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

 And here is the relavent report to go with it.
 http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


 Example: std/xml.d
 Module::parse : 0.01 ( 0%)
 Module::semantic : 0.50 ( 9%)
 Module::semantic2 : 0.02 ( 0%)
 Module::semantic3 : 0.04 ( 1%)
 Module::genobjfile : 0.10 ( 2%)

 For the entire time it took to compile the one file (5.22 seconds) -
 it spent almost 10% of it's time running the first semantic analysis.


 But that was the D2 frontend / phobos as of September 2010.  I should
 re-run a report on updated times and draw some comparisons. :~)

Is time spent on I/O accounted for in the parse step? And where is the 
rest spent :)

-- 
Dmitry Olshansky

Jun 13 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 13 June 2012 10:45, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com> =A0wrote:
 On 12/06/12 18:46, Walter Bright wrote:

 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files ove=





r
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable



 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation wor=




k
 in the compiler.



 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation
 time?
 Has anyone done some solid profiling?

 Lexing and Parsing are miniscule tasks in comparison to the three
 semantic runs done on the code.

 I added speed counters into the glue code of GDC some time ago.

 http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-i=


n-gdc/
 And here is the relavent report to go with it.
 http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


 Example: std/xml.d
 Module::parse : 0.01 ( 0%)
 Module::semantic : 0.50 ( 9%)
 Module::semantic2 : 0.02 ( 0%)
 Module::semantic3 : 0.04 ( 1%)
 Module::genobjfile : 0.10 ( 2%)

 For the entire time it took to compile the one file (5.22 seconds) -
 it spent almost 10% of it's time running the first semantic analysis.


 But that was the D2 frontend / phobos as of September 2010. =A0I should
 re-run a report on updated times and draw some comparisons. :~)

 Is time spent on I/O accounted for in the parse step? And where is the re=

st
 spent :)

It would be, the counter starts before the files are even touched, and
ends after they are closed.

The rest of the time spent is in the GCC backend, going through the
some 60+ code passes and outputting the assembly to file.


--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Jun 13 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 13.06.2012 14:16, Iain Buclaw wrote:
 On 13 June 2012 10:45, Dmitry Olshansky<dmitry.olsh gmail.com>  wrote:
 On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>    wrote:
 On 12/06/12 18:46, Walter Bright wrote:

 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable



 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.



 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation
 time?
 Has anyone done some solid profiling?

 Lexing and Parsing are miniscule tasks in comparison to the three
 semantic runs done on the code.

 I added speed counters into the glue code of GDC some time ago.

 http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

 And here is the relavent report to go with it.
 http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


 Example: std/xml.d
 Module::parse : 0.01 ( 0%)
 Module::semantic : 0.50 ( 9%)
 Module::semantic2 : 0.02 ( 0%)
 Module::semantic3 : 0.04 ( 1%)
 Module::genobjfile : 0.10 ( 2%)

 For the entire time it took to compile the one file (5.22 seconds) -
 it spent almost 10% of it's time running the first semantic analysis.


 But that was the D2 frontend / phobos as of September 2010.  I should
 re-run a report on updated times and draw some comparisons. :~)

 Is time spent on I/O accounted for in the parse step? And where is the rest
 spent :)

 It would be, the counter starts before the files are even touched, and
 ends after they are closed.

Ok, then parsing is indistinguishable from I/O and together are only 
tiny fraction of the whole. Great info, thanks.

 The rest of the time spent is in the GCC backend, going through the
 some 60+ code passes and outputting the assembly to file.

Damn, I like DMD :)



-- 
Dmitry Olshansky

Jun 13 2012

deadalnix <deadalnix gmail.com> writes:

Le 13/06/2012 11:37, Iain Buclaw a �crit :
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>  wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable


 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.


 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation time?
 Has anyone done some solid profiling?

 Lexing and Parsing are miniscule tasks in comparison to the three
 semantic runs done on the code.

 I added speed counters into the glue code of GDC some time ago.
 http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/

 And here is the relavent report to go with it.
 http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


 Example: std/xml.d
 Module::parse : 0.01 ( 0%)
 Module::semantic : 0.50 ( 9%)
 Module::semantic2 : 0.02 ( 0%)
 Module::semantic3 : 0.04 ( 1%)
 Module::genobjfile : 0.10 ( 2%)

 For the entire time it took to compile the one file (5.22 seconds) -
 it spent almost 10% of it's time running the first semantic analysis.


 But that was the D2 frontend / phobos as of September 2010.  I should
 re-run a report on updated times and draw some comparisons. :~)


 Regards

Nice numbers ! It also show that the slowest part is the backend.

Can you get some number on a recent version of D ? And in some different 
D codes (ie, template intensive or not for instance is nice to compare).

Jun 13 2012

"Kagamin" <spam here.lot> writes:

The measurements should be done for modules being imported, not 
the module being compiled.
Something like this.
---
import std.algorithm;
import std.stdio;
import std.typecons;
import std.datetime;

int ok;
---

Jun 13 2012

"Kagamin" <spam here.lot> writes:

On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not 
 the module being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di

Jun 13 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the module
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---


 Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that
*that* module will have an impact on parse times.  But I'm still
persistent that the majority of the compile time in the frontend is
done in the first semantic pass, and not the read/parser stage. :~)


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Jun 13 2012

"Kagamin" <spam here.lot> writes:

On Wednesday, 13 June 2012 at 11:47:31 UTC, Iain Buclaw wrote:
 std.datetime is one reason for me to run it again. I can 
 imagine that
 *that* module will have an impact on parse times.  But I'm still
 persistent that the majority of the compile time in the 
 frontend is
 done in the first semantic pass, and not the read/parser stage. 
 :~)

Probably. Also test with -fsyntax-only is it works and runs 
semantic passes.

Jun 13 2012

Jacob Carlborg <doob me.com> writes:

On 2012-06-13 13:47, Iain Buclaw wrote:

 std.datetime is one reason for me to run it again. I can imagine that
 *that* module will have an impact on parse times.  But I'm still
 persistent that the majority of the compile time in the frontend is
 done in the first semantic pass, and not the read/parser stage. :~)

You should try the Objective-C/D bridge, that took quite a while to 
compile. Although it will probably not compile any more, haven't been 
update. I think it was only for D1 as well. I think that was most 
templates so I guess that would mean the some of the semantic passes.

-- 
/Jacob Carlborg

Jun 13 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 13 June 2012 12:47, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the mod=



ule
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---


 Oh and let it import .d files, not .di

 std.datetime is one reason for me to run it again. I can imagine that
 *that* module will have an impact on parse times. =A0But I'm still
 persistent that the majority of the compile time in the frontend is
 done in the first semantic pass, and not the read/parser stage. :~)

Rebuilt a compile log with latest gdc as of writing on the 2.059
frontend / library.

http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


Notes about it:
- GCC has 4 new time counters
  -  phase setup  (time spent loading the compile time environment)
  -  phase parsing  (time spent in the frontend)
  -  phase generate (time spent in the backend)
  -  phase finalize  (time spent cleaning up and exiting)

- Of the phase parsing stage, it is broken down into 5 components
  -  Module::parse
  -  Module::semantic
  -  Module::semantic2
  -  Module::semantic3
  -  Module::genobjfile

- Module::read, Module::parse and Module::importAll in the one I did 2
years ago are now counted as part of just the one parsing stage,
rather than separate just to make it a little bit more balanced. :-)


I'll post a tl;dr later on it.

--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Jun 16 2012

deadalnix <deadalnix gmail.com> writes:

Le 16/06/2012 11:18, Iain Buclaw a �crit :
 On 13 June 2012 12:47, Iain Buclaw<ibuclaw ubuntu.com>  wrote:
 On 13 June 2012 12:33, Kagamin<spam here.lot>  wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the module
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---


 Oh and let it import .d files, not .di

 std.datetime is one reason for me to run it again. I can imagine that
 *that* module will have an impact on parse times.  But I'm still
 persistent that the majority of the compile time in the frontend is
 done in the first semantic pass, and not the read/parser stage. :~)

 Rebuilt a compile log with latest gdc as of writing on the 2.059
 frontend / library.

 http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
 http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


 Notes about it:
 - GCC has 4 new time counters
    -  phase setup  (time spent loading the compile time environment)
    -  phase parsing  (time spent in the frontend)
    -  phase generate (time spent in the backend)
    -  phase finalize  (time spent cleaning up and exiting)

 - Of the phase parsing stage, it is broken down into 5 components
    -  Module::parse
    -  Module::semantic
    -  Module::semantic2
    -  Module::semantic3
    -  Module::genobjfile

 - Module::read, Module::parse and Module::importAll in the one I did 2
 years ago are now counted as part of just the one parsing stage,
 rather than separate just to make it a little bit more balanced. :-)


 I'll post a tl;dr later on it.

Thank you very much for your work.

Jun 19 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 16 June 2012 10:18, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:47, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the mo=




dule
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---


 Oh and let it import .d files, not .di

 std.datetime is one reason for me to run it again. I can imagine that
 *that* module will have an impact on parse times. =A0But I'm still
 persistent that the majority of the compile time in the frontend is
 done in the first semantic pass, and not the read/parser stage. :~)

 Rebuilt a compile log with latest gdc as of writing on the 2.059
 frontend / library.

 http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf
 http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf


 Notes about it:
 - GCC has 4 new time counters
 =A0- =A0phase setup =A0(time spent loading the compile time environment)
 =A0- =A0phase parsing =A0(time spent in the frontend)
 =A0- =A0phase generate (time spent in the backend)
 =A0- =A0phase finalize =A0(time spent cleaning up and exiting)

 - Of the phase parsing stage, it is broken down into 5 components
 =A0- =A0Module::parse
 =A0- =A0Module::semantic
 =A0- =A0Module::semantic2
 =A0- =A0Module::semantic3
 =A0- =A0Module::genobjfile

 - Module::read, Module::parse and Module::importAll in the one I did 2
 years ago are now counted as part of just the one parsing stage,
 rather than separate just to make it a little bit more balanced. :-)


 I'll post a tl;dr later on it.

tl;dr

Total number of source files compiled: 207
Total time to build druntime and phobos:  78.08 seconds
Time spent parsing: 17.15 seconds
Average time spent parsing: 0.08 seconds
Time spent running semantic passes: 10.04 seconds

Time spent generating backend AST: 2.15 seconds
Time spent in backend: 48.62 seconds


So parsing time has taken quite a hit since I last did any reports on
compilation speed of building phobos.  I suspect most of that comes
from the loading of symbols from all imports and that there have been
some large additions to phobos recently which provide a constant
bottle neck if one was to choose compiling one source at a time.  As
the apparent large amount of time spent parsing sources does not show
when compiling all at once.

 Module::parse: 0.58 seconds (1%)
 Module::semantic: 0.24 seconds (1%)
 Module::semantic2: 0.01 seconds (0%)
 Module::semantic3: 2.85 seconds (6%)
 Module::genobjfile: 1.24 seconds ( 3%)
 TOTAL: 47.06 seconds

Considering that the entire phobos library is some 165K lines of code,
I don't see why people aren't laughing about just how quick the
frontend is at parsing. :~)


Regards
--=20
Iain Buclaw

*(p < e ? p++ : p) =3D (c & 0x0f) + '0';

Jun 16 2012

Guillaume Chatelet <chatelet.guillaume gmail.com> writes:

 So parsing time has taken quite a hit since I last did any reports on
 compilation speed of building phobos.

So maybe my post about "keeping import clean" wasn't as irrelevant as I
thought.

http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890

--
Guillaume

Jun 16 2012

Iain Buclaw <ibuclaw ubuntu.com> writes:

On 16 June 2012 22:17, Guillaume Chatelet <chatelet.guillaume gmail.com> wrote:
 So parsing time has taken quite a hit since I last did any reports on
 compilation speed of building phobos.

 So maybe my post about "keeping import clean" wasn't as irrelevant as I
 thought.

 http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890

 --
 Guillaume


I think it's relevancy is only geared towards projects that are
compiling one file at a time - ie: I'd expect all gdc users to be
compiling in this way as whole program compilation using gdc still
needs some rigourous testing first.  If there is a particular large
module, or set of large modules that are persistantly being
importanted, then you will see a notable constant slowdown on
compilation of each file.


-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';

Jun 19 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.

 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation time? Has
 anyone done some solid profiling?

It is for debug builds.


 (b) Wasn't one of the goals of D's module system supposed to be that you could
 import a symbol table? Why not just implement that? Seems like that would be
 much faster than .di files can ever be.

Yes, it is designed so you could just import a symbol table. It is done as 
source code, however, because it's trivial to implement.

Jun 13 2012

Don Clugston <dac nospam.com> writes:

On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.

 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation
 time? Has
 anyone done some solid profiling?

 It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on 
semantic1().
Do you have data that shows otherwise?

It seems to me, that slow parsing is a C++ problem which D already solved.

 (b) Wasn't one of the goals of D's module system supposed to be that
 you could
 import a symbol table? Why not just implement that? Seems like that
 would be
 much faster than .di files can ever be.

 Yes, it is designed so you could just import a symbol table. It is done
 as source code, however, because it's trivial to implement.

It has those nasty side-effects listed under (3) though.

Jun 14 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
 On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:
 
 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

 
 (4) was not a goal.
 
 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.

 
 I don't understand (1) actually.
 
 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation
 time? Has
 anyone done some solid profiling?

 
 It is for debug builds.

 
 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?
 
 It seems to me, that slow parsing is a C++ problem which D already solved.

If this is the case, is there any value at all to using .di files in druntime 
or Phobos other than in cases where we're specifically trying to hide 
implementation (e.g. with the GC)? Or do we still end up paying the semantic 
cost for importing the .d files such that using .di files would still help with 
compilation times?

- Jonathan M Davis

Jun 14 2012

"Kagamin" <spam here.lot> writes:

On Thursday, 14 June 2012 at 08:11:02 UTC, Jonathan M Davis wrote:
 Or do we still end up paying the semantic
 cost for importing the .d files such that using .di files would 
 still help with
 compilation times?

Oh, right, the module can use mixins and CTFE, so it should be 
semantically checked, but the semantic check may be minimal just 
like in the case of a .di file.

Jun 14 2012

Don Clugston <dac nospam.com> writes:

On 14/06/12 10:10, Jonathan M Davis wrote:
 On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
 On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

 (4) was not a goal.

 A .di file could very well be a binary file, but making it look like D
 source enabled them to be loaded with no additional implementation work
 in the compiler.

 I don't understand (1) actually.

 For two reasons:
 (a) Is lexing + parsing really a significant part of the compilation
 time? Has
 anyone done some solid profiling?

 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 It seems to me, that slow parsing is a C++ problem which D already solved.

 If this is the case, is there any value at all to using .di files in druntime
 or Phobos other than in cases where we're specifically trying to hide
 implementation (e.g. with the GC)? Or do we still end up paying the semantic
 cost for importing the .d files such that using .di files would still help with
 compilation times?

 - Jonathan M Davis

I don't think Phobos should use .di files at all. I don't think there 
are any cases where we want to conceal code.

The performance benefit you would get is completely negligible. It 
doesn't even reduce the number of files that need to be loaded, just the 
length of each one.

I think that, for example, improving the way that array literals are 
dealt with would have at least as much impact on compilation time.
For the DMD backend, fixing up the treatment of comma expressions would 
have a much bigger impact than getting lexing and parsing time to zero.

And we're well set up for parallel compilation. There's no shortage of 
things we can do to improve compilation time.

Using di files for speed seems a bit like jettisoning the cargo to keep 
the ship afloat. It works but you only do it when you've got no other 
options.

Jun 14 2012

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, June 15, 2012 08:58:55 Don Clugston wrote:
 I don't think Phobos should use .di files at all. I don't think there
 are any cases where we want to conceal code.
 
 The performance benefit you would get is completely negligible. It
 doesn't even reduce the number of files that need to be loaded, just the
 length of each one.
 
 I think that, for example, improving the way that array literals are
 dealt with would have at least as much impact on compilation time.
 For the DMD backend, fixing up the treatment of comma expressions would
 have a much bigger impact than getting lexing and parsing time to zero.
 
 And we're well set up for parallel compilation. There's no shortage of
 things we can do to improve compilation time.
 
 Using di files for speed seems a bit like jettisoning the cargo to keep
 the ship afloat. It works but you only do it when you've got no other
 options.

On several occasions, Walter has expressed the desire to make Phobos use .di 
files like druntime does, otherwise I probably would never have considered it. 
Personally, I don't want to bother with it unless there's a large benefit from 
it, so if we're sure that the gain is minimal, then I say that we should just 
leave it all as .d files. Most of of Phobos would have to have its 
implementation left in any .di files anyway so that inlining and CTFE could 
work.

- Jonathan M Davis

Jun 15 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 6/14/2012 11:58 PM, Don Clugston wrote:
 And we're well set up for parallel compilation. There's no shortage of things
we
 can do to improve compilation time.

The language is carefully designed, so that at least in theory all the passes 
could be done in parallel. I've got the file reads in parallel, but I'd love to 
have the lexing, parsing, semantic, optimization, and code gen all done in 
parallel. Wouldn't that be awesome!

 Using di files for speed seems a bit like jettisoning the cargo to keep the
ship
 afloat. It works but you only do it when you've got no other options.

.di files don't make a whole lotta sense for small files, but the bigger they 
get, the more they are useful. D needs to be scalable to enormous project sizes.

Jun 16 2012

deadalnix <deadalnix gmail.com> writes:

Le 17/06/2012 00:41, Walter Bright a écrit :
 On 6/14/2012 11:58 PM, Don Clugston wrote:
 And we're well set up for parallel compilation. There's no shortage of
 things we
 can do to improve compilation time.

 The language is carefully designed, so that at least in theory all the
 passes could be done in parallel. I've got the file reads in parallel,
 but I'd love to have the lexing, parsing, semantic, optimization, and
 code gen all done in parallel. Wouldn't that be awesome!

 Using di files for speed seems a bit like jettisoning the cargo to
 keep the ship
 afloat. It works but you only do it when you've got no other options.

 .di files don't make a whole lotta sense for small files, but the bigger
 they get, the more they are useful. D needs to be scalable to enormous
 project sizes.

The key point is project size here. I wouldn't expect file size to 
increase in an important manner.

Jun 19 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on semantic1().
 Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.


 Yes, it is designed so you could just import a symbol table. It is done
 as source code, however, because it's trivial to implement.

 It has those nasty side-effects listed under (3) though.

I don't think they're nasty or are side effects.

Jun 16 2012

Don Clugston <dac nospam.com> writes:

On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and 
you've fixed those problems in the design of D.
And as far as I can tell, you were extremely successful!
Parsing in D is very, very fast.

 Yes, it is designed so you could just import a symbol table. It is done
 as source code, however, because it's trivial to implement.

 It has those nasty side-effects listed under (3) though.

 I don't think they're nasty or are side effects.

They are new problems which people ask for solutions for. And they are 
far more difficult to solve than the original problem.

Jun 18 2012

Walter Bright <newshound2 digitalmars.com> writes:

On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently slow, and you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow.

I was fairly disappointed that asynchronously reading the source files didn't 
have a measurable effect most of the time.

Jun 18 2012

"Daniel" <wyrlon gmx.net> writes:

On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time 
 taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently 
 slow, and you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

 Yeah, but I can't escape that lingering feeling that lexing is 
 slow.

 I was fairly disappointed that asynchronously reading the 
 source files didn't have a measurable effect most of the time.

Same here, I wish there were a standardized pre-lexed-token 
"binary" file-format, would benefit all text editors also, as 
they need to lex it anyway to perform color syntax highlighting.

Jun 18 2012

"Chris Cain" <clcain uncg.edu> writes:

On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:
 Same here, I wish there were a standardized pre-lexed-token 
 "binary" file-format, would benefit all text editors also, as 
 they need to lex it anyway to perform color syntax highlighting.

If I were to make my own language, I'd forego a human-readable 
format and just have the "language" be defined as a big 
machine-readable AST. You'd have to have an IDE, but it could 
display the code in just about any way the person wants (syntax, 
style, etc).

Syntax highlighting would be instantaneous and there would be 
fewer errors made by programmers (maybe ...). Plus it'd be 
unbelievably easy to implement things like auto-completion.

Jun 18 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 06/19/2012 02:47 AM, Chris Cain wrote:
 On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:
 Same here, I wish there were a standardized pre-lexed-token "binary"
 file-format, would benefit all text editors also, as they need to lex
 it anyway to perform color syntax highlighting.

 If I were to make my own language, I'd forego a human-readable format
 and just have the "language" be defined as a big machine-readable AST.

http://de.wikipedia.org/wiki/Lisp ?

 You'd have to have an IDE, but it could display the code in just about
 any way the person wants (syntax, style, etc).

This could be done even if the language's source code storage format is 
human-readable.

 Syntax highlighting would be instantaneous and there would be fewer
 errors made by programmers (maybe ...). Plus it'd be unbelievably easy
 to implement things like auto-completion.

Parsing is not a huge issue. Depending on how powerful the language is, 
auto-completion may depend on full code analysis.

Jun 18 2012

"Kagamin" <spam here.lot> writes:

On Tuesday, 19 June 2012 at 01:47:27 UTC, Timon Gehr wrote:
 Parsing is not a huge issue. Depending on how powerful the 
 language is, auto-completion may depend on full code analysis.

Yep, pegged runs at compile time.

Jun 19 2012

"Kagamin" <spam here.lot> writes:

On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 Yeah, but I can't escape that lingering feeling that lexing is 
 slow.

 I was fairly disappointed that asynchronously reading the 
 source files didn't have a measurable effect most of the time.

I don't even understand all this rage about asynchronicity, if 
the program has nothing to do until it reads the data, 
asynchronicity won't help you in the slightest. Anyway everything 
is stuck while the device performs DMA.

Jun 19 2012

dennis luehring <dl.soluz gmx.net> writes:

Am 19.06.2012 09:43, schrieb Kagamin:
 On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 Yeah, but I can't escape that lingering feeling that lexing is
 slow.

 I was fairly disappointed that asynchronously reading the
 source files didn't have a measurable effect most of the time.


 I don't even understand all this rage about asynchronicity, if
 the program has nothing to do until it reads the data,

the lexing and parsing process can be asynchron - i will be faster on 
multiple cores because there is no dependency between seperated 
lexing-parsing threads - why to lex/parse in sequence then?

 asynchronicity won't help you in the slightest. Anyway everything
 is stuck while the device performs DMA.

yea down to the hardware level - but there are caches etc. out there - 
its not like "multithreaded-file-reading-is-always-fast-like-synchron", 
and also not "asynchron-file-reading-is-always-faster" - more somewere 
in between :)

Jun 19 2012

dennis luehring <dl.soluz gmx.net> writes:

Am 18.06.2012 19:53, schrieb Walter Bright:
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently slow, and you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

 Yeah, but I can't escape that lingering feeling that lexing is slow.

 I was fairly disappointed that asynchronously reading the source files didn't
 have a measurable effect most of the time.

so you started you lexing, parsing in seperated threads for each file - 
where was synchronization needed, have you measured what parts of the 
code makes it like synchron reading - or is it the file reading itself?

Jun 19 2012

deadalnix <deadalnix gmail.com> writes:

Le 18/06/2012 19:53, Walter Bright a écrit :
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently slow, and
 you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

 Yeah, but I can't escape that lingering feeling that lexing is slow.

 I was fairly disappointed that asynchronously reading the source files
 didn't have a measurable effect most of the time.

It is kind of religious. We need data.

Jun 19 2012

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 18 Jun 2012 13:53:43 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently slow, and  
 you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

 Yeah, but I can't escape that lingering feeling that lexing is slow.

 I was fairly disappointed that asynchronously reading the source files  
 didn't have a measurable effect most of the time.

I have found that my project, which has a huge number of symbols (And  
large ones) compiles much slower than I would expect.  Perhaps you have  
forgotten about this issue:

http://d.puremagic.com/issues/show_bug.cgi?id=4900

Maybe fixing this still doesn't help parsing, not sure.

-Steve

Jun 25 2012

"Martin Nowak" <dawg dawgfoto.de> writes:

On Mon, 18 Jun 2012 19:53:43 +0200, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

 Iain's data indicates that it's only a few % of the time taken on
 semantic1().
 Do you have data that shows otherwise?

 Nothing recent, it's mostly from my C++ compiler testing.

 But you argued in your blog that C++ parsing is inherently slow, and  
 you've
 fixed those problems in the design of D.
 And as far as I can tell, you were extremely successful!
 Parsing in D is very, very fast.

 Yeah, but I can't escape that lingering feeling that lexing is slow.

 I was fairly disappointed that asynchronously reading the source files  
 didn't have a measurable effect most of the time.

Lexing is definitely taking a big part of debug compilation time.
I haven't profiled the compiler for some time now but here are some  
thoughts.

- speeding up the identifier hash table
   there was always a profile spike at StringTable::lookup, though it  
reduced
   since you increased the bucket count

- memory mapping the source file saves a copy for UTF-8 sources
   this is by far the fastest way to read a source file

- parallel reading/parsing doesn't help much if most of the source files  
are
   read during import semantic

I'm regularly hitting other bottle necks so I don't think that lexing is  

When compiling std.range with unittests for example more that 50% of the  
compile time
is spend to check for existing template instantiations using O(N^2)/2  
compares of template arguments.
If we managed to fix http://d.puremagic.com/issues/show_bug.cgi?id=7469 we  
could efficiently use
the mangled name as key.

Jun 25 2012

D Programming

C/C++ Programming

Other

digitalmars.D - AST files instead of DI interface files for faster compilation and