www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - AST files instead of DI interface files for faster compilation and

reply "timotheecour" <thelastmammoth gmail.com> writes:
There's a current pull request to improve di file generation 
(https://github.com/D-Programming-Language/dmd/pull/945); I'd 
like to suggest further ideas.
As far as I understand, di interface files try to achieve these 
conflicting goals:

1) speed up compilation by avoiding having to reparse large files 
over and over.
2) hide implementation details for proprietary reasons
3) still maintain source code in some form to allow inlining and 
CTFE
4) be human readable

-Goals 2) and 3) are clearly contradictory, so that calls for a 
command line switch (eg -hidesource), which should be off by 
default, which when set will indeed remove any implementation 
details (where possible, ie for non-template and non-auto-return 
functions) but as a counterpart also prevent any chance for 
inlining/CTFE for the corresponding exported API. That choice 
will be left to the user.

-Regarding point 1), it won't be untypical to have a D interface 
file to be almost as large (and slow to parse) as the original 
source file, even with the upcoming di file improvements 
(dmd/pull/945), as D encourages the use of templates/auto-return 
throughout (a large part of phobos would be left 
quasi-unchanged). In fact, the fast compile time of D _does_ 
suffer when there are heavy use of templates, or scaling up.

So to make interface files really useful in terms of speeding up 
compilation, why not directly store the AST (could be text-based 
like JSON but preferably a portable binary format for speed, call 
it ".dib" file), with possibly some amount of analysis (eg: 
version(windows) could be pre-handled). This would be analoguous 
to precompiled header files 
(http://en.wikipedia.org/wiki/Precompiled_header), which don't 
exist in D AFAIK. This could be done by extending the currently 
incomplete json file generation by dmd, to include AST of 
implementation of each function we want to export such as 
templates or stuff to inline). During compilation of a module, 
"import myfun;" would look for 1) myfun.dib (binary or json 
precompiled interface file), 2) myfun.di (if still needed), 3) 
myfun.d.



We could even go a step further, borrowing some ideas from the 
"framework" feature found in OSX to distribute components: a 
single D framework would combine the AST (~ precompiled .dib 
headers) of a set of D modules and a set of libraries.
The user would then use a framework as follows:

     dmd -L-framework mylib -L-Lpath/to/mylib main.d

or simply:

     dmd main.d

if main.d contains pragma(framework,"mylib") and framework mylib 
is in the search path

As in OSX's frameworks, framework mylib is used both during 
compilation (resolving import statements in main.d) and linking. 
Upon encountering an "import myfun;" declaration, the compiler 
would search the linked in frameworks for a symbol or file 
representing the corresponding AST of module myfun, and if not 
found, use the default import mechanism.
That will both speed up compilation times and make distribution 
of libraries and versioning a breeze: single framework to 
download and to link against (this is different from what rdmd 
does). On OSX, frameworks appear as a single file in Finder but 
are actually directories; here we could have either a single file 
or a directory as well.

Finally, regarding point 4), a simple command line switch (eg dmd 
--pretty-print myfun.di) will pretty-print to stdout the AST, and 
omit the implementation of templates and auto functions for 
brevity, so they appear as simple di files (but some options 
could filter out AST nodes for IDE use, etc).

Thanks for your comments!
Jun 12 2012
next sibling parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
Currently .di-files are compiler independent. If this should hold 
for dib-files, too, we'll need a standard ast structure, won't we?
Jun 12 2012
next sibling parent reply =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 12-06-2012 12:23, Tobias Pankrath wrote:
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

Which is a Good Thing (TM). It would /require/ formalization of the language once and for all. -- Alex Rønne Petersen alex lycus.org http://lycus.org
Jun 12 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/12/2012 12:47 PM, Alex Rønne Petersen wrote:
 On 12-06-2012 12:23, Tobias Pankrath wrote:
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

Which is a Good Thing (TM). It would /require/ formalization of the language once and for all.

I do not see how this conclusion could be reached.
Jun 12 2012
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 12/06/2012 12:23, Tobias Pankrath a écrit :
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

We need it anyway at some point. AST macro is another example. It would also greatly simplify compiler writing if the D interpreter could be provided as lib (and so run on top of dib file). I want to mention that LLVM IR + metadata can do a really good job here. In addition, LLVM people are working on a JIT backend, if you know what I mean ;)
Jun 12 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/12/2012 03:54 PM, deadalnix wrote:
 Le 12/06/2012 12:23, Tobias Pankrath a écrit :
 Currently .di-files are compiler independent. If this should hold for
 dib-files, too, we'll need a standard ast structure, won't we?

We need it anyway at some point.

Plain D code is already a perfectly fine standard AST structure.
  AST macro is another example.

AST macros may refer to AST structures by their representations as D code.
 It would also greatly simplify compiler writing if the D interpreter
 could be provided as lib (and so run on top of dib file).

I don't think so. Writing the interpreter is a rather straightforward part of the compiler implementation. Why would you want to run it on top of a '.dib' file anyway? Serializing/deserializing the AST is too much overhead.
 I want to mention that LLVM IR + metadata can do a really good job here.
 In addition, LLVM people are working on a JIT backend, if you know what
 I mean ;)

Interpreting manually is not harder than CTFE-compatible LLVM IR code generation, but the LLVM JIT could certainly be leveraged to improve compilation speeds.
Jun 12 2012
prev sibling next sibling parent reply Don Clugston <dac nospam.com> writes:
On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).
Jun 12 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc.
 This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere.
Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

compressors tried doing the Huffman thing on source code tokens with a certain success.
 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?

-- Dmitry Olshansky
Jun 12 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12.06.2012 22:47, Adam Wilson wrote:
 On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky
 <dmitry.olsh gmail.com> wrote:

 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc.
 This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere.

I completely agree with this. The interactions between the D module system and D toolchain are utterly confusing to newcomers, especially those from other C-like languages. There are better ways, see .NET Assemblies and Pascal Units. These problems were solved decades ago. Why are we still using 40-year-old paradigms?
Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

compressors tried doing the Huffman thing on source code tokens with a certain success.

I don't see the value of compression. Lexing would already reduce the size significantly and compression would only add to processing times. Disk is cheap.

I/O is not. (De)Compression on the fly is more and more intersecting direction these days. The less you read/write the faster you get. Knowing beforehand the distribution of keywords relative frequency is a boon. Yet I agree that it's premature at the moment.
 Beyond that though, this is absolutely the direction D must head in. In
 my mind the DI generation patch was mostly just a stop-gap to bring
 DI-gen up-to-date with the current system thereby giving us enough time
 to tackle the (admittedly huge) task of building COFF into the backend,
 emitting the lexed source into a special section and then giving the
 compiler *AND* linker the ability to read out the source. For example
 the giving the linker the ability to read out source code essentially
 requires a brand-new linker. Although, it is my personal opinion that
 the linker should be integrated with the compiler and done as one step,
 this way the linker could have intimate knowledge of the source and
 would enable some spectacular LTO options. If only DMD were written in
 D, then we could really open the compile speed throttles with an MT
 build model...

-- Dmitry Olshansky
Jun 12 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-06-12 14:09, foobar wrote:

 This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

Can't the same be done with OMF? I'm not saying I want to keep OMF. -- /Jacob Carlborg
Jun 12 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 12/06/2012 14:39, foobar a écrit :
 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?

LLVM is definitively something I look at more and more. It is a great weapon for D IMO.
Jun 12 2012
prev sibling next sibling parent "foobar" <foo bar.com> writes:
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd 
 like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?
Jun 12 2012
prev sibling next sibling parent "foobar" <foo bar.com> writes:
On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd 
 like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic. This is a solved problem since the 80's (E.g. Pascal units). Per Adam's post, the issue is tied to DMD's use of OMF/optlink which we all would like to get rid of anyway. Once we're in proper COFF land, couldn't we just store the required metadata (binary AST?) in special sections in the object files themselves? Another related question - AFAIK the LLVM folks did/are doing work to make their implementation less platform-depended. Could we leverage this in ldc to store LLVM bit code as D libs which still retain enough info for the compiler to replace header files?
Jun 12 2012
prev sibling next sibling parent "Adam Wilson" <flyboynw gmail.com> writes:
On Tue, 12 Jun 2012 06:46:44 -0700, Jacob Carlborg <doob me.com> wrote:

 On 2012-06-12 14:09, foobar wrote:

 This is a solved problem since the 80's (E.g. Pascal units). Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

Can't the same be done with OMF? I'm not saying I want to keep OMF.

OMF doesn't support Custom Sections and I think a custom section is the right way to handle this. I found the Borland OMF docs once a while back to verify this. -- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Jun 12 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to suggest
 further ideas.
 As far as I understand, di interface files try to achieve these conflicting
goals:

 1) speed up compilation by avoiding having to reparse large files over and
over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.
Jun 12 2012
parent reply Don Clugston <dac nospam.com> writes:
On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling? (b) Wasn't one of the goals of D's module system supposed to be that you could import a symbol table? Why not just implement that? Seems like that would be much faster than .di files can ever be.
Jun 13 2012
next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>  wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~)

Is time spent on I/O accounted for in the parse step? And where is the rest spent :) -- Dmitry Olshansky
Jun 13 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 13.06.2012 14:16, Iain Buclaw wrote:
 On 13 June 2012 10:45, Dmitry Olshansky<dmitry.olsh gmail.com>  wrote:
 On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>    wrote:
 On 12/06/12 18:46, Walter Bright wrote:

 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~)

Is time spent on I/O accounted for in the parse step? And where is the rest spent :)

It would be, the counter starts before the files are even touched, and ends after they are closed.

Ok, then parsing is indistinguishable from I/O and together are only tiny fraction of the whole. Great info, thanks.
 The rest of the time spent is in the GCC backend, going through the
 some 60+ code passes and outputting the assembly to file.

Damn, I like DMD :) -- Dmitry Olshansky
Jun 13 2012
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
Le 13/06/2012 11:37, Iain Buclaw a crit :
 On 13 June 2012 09:07, Don Clugston<dac nospam.com>  wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Regards

Nice numbers ! It also show that the slowest part is the backend. Can you get some number on a recent version of D ? And in some different D codes (ie, template intensive or not for instance is nice to compare).
Jun 13 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

It is for debug builds.
 (b) Wasn't one of the goals of D's module system supposed to be that you could
 import a symbol table? Why not just implement that? Seems like that would be
 much faster than .di files can ever be.

Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement.
Jun 13 2012
parent reply Don Clugston <dac nospam.com> writes:
On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? It seems to me, that slow parsing is a C++ problem which D already solved.
 (b) Wasn't one of the goals of D's module system supposed to be that
 you could
 import a symbol table? Why not just implement that? Seems like that
 would be
 much faster than .di files can ever be.

Yes, it is designed so you could just import a symbol table. It is done as source code, however, because it's trivial to implement.

It has those nasty side-effects listed under (3) though.
Jun 14 2012
next sibling parent reply Don Clugston <dac nospam.com> writes:
On 14/06/12 10:10, Jonathan M Davis wrote:
 On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
 On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? It seems to me, that slow parsing is a C++ problem which D already solved.

If this is the case, is there any value at all to using .di files in druntime or Phobos other than in cases where we're specifically trying to hide implementation (e.g. with the GC)? Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times? - Jonathan M Davis

I don't think Phobos should use .di files at all. I don't think there are any cases where we want to conceal code. The performance benefit you would get is completely negligible. It doesn't even reduce the number of files that need to be loaded, just the length of each one. I think that, for example, improving the way that array literals are dealt with would have at least as much impact on compilation time. For the DMD backend, fixing up the treatment of comma expressions would have a much bigger impact than getting lexing and parsing time to zero. And we're well set up for parallel compilation. There's no shortage of things we can do to improve compilation time. Using di files for speed seems a bit like jettisoning the cargo to keep the ship afloat. It works but you only do it when you've got no other options.
Jun 14 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/14/2012 11:58 PM, Don Clugston wrote:
 And we're well set up for parallel compilation. There's no shortage of things
we
 can do to improve compilation time.

The language is carefully designed, so that at least in theory all the passes could be done in parallel. I've got the file reads in parallel, but I'd love to have the lexing, parsing, semantic, optimization, and code gen all done in parallel. Wouldn't that be awesome!
 Using di files for speed seems a bit like jettisoning the cargo to keep the
ship
 afloat. It works but you only do it when you've got no other options.

.di files don't make a whole lotta sense for small files, but the bigger they get, the more they are useful. D needs to be scalable to enormous project sizes.
Jun 16 2012
parent deadalnix <deadalnix gmail.com> writes:
Le 17/06/2012 00:41, Walter Bright a écrit :
 On 6/14/2012 11:58 PM, Don Clugston wrote:
 And we're well set up for parallel compilation. There's no shortage of
 things we
 can do to improve compilation time.

The language is carefully designed, so that at least in theory all the passes could be done in parallel. I've got the file reads in parallel, but I'd love to have the lexing, parsing, semantic, optimization, and code gen all done in parallel. Wouldn't that be awesome!
 Using di files for speed seems a bit like jettisoning the cargo to
 keep the ship
 afloat. It works but you only do it when you've got no other options.

.di files don't make a whole lotta sense for small files, but the bigger they get, the more they are useful. D needs to be scalable to enormous project sizes.

The key point is project size here. I wouldn't expect file size to increase in an important manner.
Jun 19 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.
 Yes, it is designed so you could just import a symbol table. It is done
 as source code, however, because it's trivial to implement.

It has those nasty side-effects listed under (3) though.

I don't think they're nasty or are side effects.
Jun 16 2012
parent reply Don Clugston <dac nospam.com> writes:
On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.
 Yes, it is designed so you could just import a symbol table. It is done
 as source code, however, because it's trivial to implement.

It has those nasty side-effects listed under (3) though.

I don't think they're nasty or are side effects.

They are new problems which people ask for solutions for. And they are far more difficult to solve than the original problem.
Jun 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.
Jun 18 2012
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 06/19/2012 02:47 AM, Chris Cain wrote:
 On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:
 Same here, I wish there were a standardized pre-lexed-token "binary"
 file-format, would benefit all text editors also, as they need to lex
 it anyway to perform color syntax highlighting.

If I were to make my own language, I'd forego a human-readable format and just have the "language" be defined as a big machine-readable AST.

http://de.wikipedia.org/wiki/Lisp ?
 You'd have to have an IDE, but it could display the code in just about
 any way the person wants (syntax, style, etc).

This could be done even if the language's source code storage format is human-readable.
 Syntax highlighting would be instantaneous and there would be fewer
 errors made by programmers (maybe ...). Plus it'd be unbelievably easy
 to implement things like auto-completion.

Parsing is not a huge issue. Depending on how powerful the language is, auto-completion may depend on full code analysis.
Jun 18 2012
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 19.06.2012 09:43, schrieb Kagamin:
 On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 Yeah, but I can't escape that lingering feeling that lexing is
 slow.

 I was fairly disappointed that asynchronously reading the
 source files didn't have a measurable effect most of the time.


 I don't even understand all this rage about asynchronicity, if
 the program has nothing to do until it reads the data,

the lexing and parsing process can be asynchron - i will be faster on multiple cores because there is no dependency between seperated lexing-parsing threads - why to lex/parse in sequence then?
 asynchronicity won't help you in the slightest. Anyway everything
 is stuck while the device performs DMA.

yea down to the hardware level - but there are caches etc. out there - its not like "multithreaded-file-reading-is-always-fast-like-synchron", and also not "asynchron-file-reading-is-always-faster" - more somewere in between :)
Jun 19 2012
prev sibling next sibling parent dennis luehring <dl.soluz gmx.net> writes:
Am 18.06.2012 19:53, schrieb Walter Bright:
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.

so you started you lexing, parsing in seperated threads for each file - where was synchronization needed, have you measured what parts of the code makes it like synchron reading - or is it the file reading itself?
Jun 19 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 18/06/2012 19:53, Walter Bright a écrit :
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.

It is kind of religious. We need data.
Jun 19 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-06-13 13:47, Iain Buclaw wrote:

 std.datetime is one reason for me to run it again. I can imagine that
 *that* module will have an impact on parse times.  But I'm still
 persistent that the majority of the compile time in the frontend is
 done in the first semantic pass, and not the read/parser stage. :~)

You should try the Objective-C/D bridge, that took quite a while to compile. Although it will probably not compile any more, haven't been update. I think it was only for D1 as well. I think that was most templates so I guess that would mean the some of the semantic passes. -- /Jacob Carlborg
Jun 13 2012
prev sibling next sibling parent Guillaume Chatelet <chatelet.guillaume gmail.com> writes:
 So parsing time has taken quite a hit since I last did any reports on
 compilation speed of building phobos.

So maybe my post about "keeping import clean" wasn't as irrelevant as I thought. http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890 -- Guillaume
Jun 16 2012
prev sibling parent deadalnix <deadalnix gmail.com> writes:
Le 16/06/2012 11:18, Iain Buclaw a crit :
 On 13 June 2012 12:47, Iain Buclaw<ibuclaw ubuntu.com>  wrote:
 On 13 June 2012 12:33, Kagamin<spam here.lot>  wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the module
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~)

Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library. http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf Notes about it: - GCC has 4 new time counters - phase setup (time spent loading the compile time environment) - phase parsing (time spent in the frontend) - phase generate (time spent in the backend) - phase finalize (time spent cleaning up and exiting) - Of the phase parsing stage, it is broken down into 5 components - Module::parse - Module::semantic - Module::semantic2 - Module::semantic3 - Module::genobjfile - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-) I'll post a tl;dr later on it.

Thank you very much for your work.
Jun 19 2012
prev sibling next sibling parent "Adam Wilson" <flyboynw gmail.com> writes:
On Tue, 12 Jun 2012 05:23:16 -0700, Dmitry Olshansky  
<dmitry.olsh gmail.com> wrote:

 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc.
 This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere.

I completely agree with this. The interactions between the D module system and D toolchain are utterly confusing to newcomers, especially those from other C-like languages. There are better ways, see .NET Assemblies and Pascal Units. These problems were solved decades ago. Why are we still using 40-year-old paradigms?
  >Per Adam's
 post, the issue is tied to DMD's use of OMF/optlink which we all would
 like to get rid of anyway. Once we're in proper COFF land, couldn't we
 just store the required metadata (binary AST?) in special sections in
 the object files themselves?

compressors tried doing the Huffman thing on source code tokens with a certain success.

I don't see the value of compression. Lexing would already reduce the size significantly and compression would only add to processing times. Disk is cheap. Beyond that though, this is absolutely the direction D must head in. In my mind the DI generation patch was mostly just a stop-gap to bring DI-gen up-to-date with the current system thereby giving us enough time to tackle the (admittedly huge) task of building COFF into the backend, emitting the lexed source into a special section and then giving the compiler *AND* linker the ability to read out the source. For example the giving the linker the ability to read out source code essentially requires a brand-new linker. Although, it is my personal opinion that the linker should be integrated with the compiler and done as one step, this way the linker could have intimate knowledge of the source and would enable some spectacular LTO options. If only DMD were written in D, then we could really open the compile speed throttles with an MT build model...
 Another related question - AFAIK the LLVM folks did/are doing work to
 make their implementation less platform-depended. Could we leverage this
 in ldc to store LLVM bit code as D libs which still retain enough info
 for the compiler to replace header files?


-- Adam Wilson IRC: LightBender Project Coordinator The Horizon Project http://www.thehorizonproject.org/
Jun 12 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Tuesday, 12 June 2012 at 12:23:21 UTC, Dmitry Olshansky wrote:
 On 12.06.2012 16:09, foobar wrote:
 On Tuesday, 12 June 2012 at 11:09:04 UTC, Don Clugston wrote:
 On 12/06/12 11:07, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); 
 I'd like to
 suggest further ideas.
 As far as I understand, di interface files try to achieve 
 these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large 
 files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining

 4) be human readable

Is that actually true? My recollection is that the original motivation was only goal (2), but I was fairly new to D at the time (2005). Here's the original post where it was implemented: http://www.digitalmars.com/d/archives/digitalmars/D/29883.html and it got partially merged into DMD 0.141 (Dec 4 2005), first usable in DMD0.142 Personally I believe that.di files are *totally* the wrong approach for goal (1). I don't think goal (1) and (2) have anything in common at all with each other, except that C tried to achieve both of them using header files. It's an OK solution for (1) in C, it's a failure in C++, and a complete failure in D. IMHO: If we want goal (1), we should try to achieve goal (1), and stop pretending its in any way related to goal (2).

I absolutely agree with the above and would also add that goal (4) is an anti-feature. In order to get a human readable version of the API the programmer should use *documentation*. D claims that one of its goals is to make it a breeze to provide documentation by bundling a standard tool - DDoc. There's no need to duplicate this just to provide another format when DDoc itself supposed to be format agnostic.

first, BUT it allows us to essentially being able to say that APIs are covered in the DDoc generated files. Not header files etc.
 This is a solved problem since the 80's (E.g. Pascal units).

Right, seeing yet another newbie hit it everyday is a clear indication of a simple fact: people would like to think & work in modules rather then seeing guts of old and crappy OBJ file technology. Linking with C != using C tools everywhere.

Back in the 90's I only moved 100% away from Turbo Pascal into C land, when I started using Linux at the University and eventually spent some time doing C++ as well. It still baffles me, that in 2012 we still need to rely in crappy C linker tooling, when in the 80's we already had languages with proper modules. Now we have many mainstream languages with proper modules, but many of them leave in VM land. Oberon, Go and Delphi/Free Pascal seem to be the only languages with native code generation compilers that offer the binary only modules solution, while many rely on some form of .di files.
Jun 13 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 13 June 2012 09:07, Don Clugston <dac nospam.com> wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-in-gdc/ And here is the relavent report to go with it. http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf Example: std/xml.d Module::parse : 0.01 ( 0%) Module::semantic : 0.50 ( 9%) Module::semantic2 : 0.02 ( 0%) Module::semantic3 : 0.04 ( 1%) Module::genobjfile : 0.10 ( 2%) For the entire time it took to compile the one file (5.22 seconds) - it spent almost 10% of it's time running the first semantic analysis. But that was the D2 frontend / phobos as of September 2010. I should re-run a report on updated times and draw some comparisons. :~) Regards -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 13 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 13 June 2012 10:45, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:
 On 13.06.2012 13:37, Iain Buclaw wrote:
 On 13 June 2012 09:07, Don Clugston<dac nospam.com> =A0wrote:
 On 12/06/12 18:46, Walter Bright wrote:

 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:

 1) speed up compilation by avoiding having to reparse large files ove=





 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation wor=




 in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

Lexing and Parsing are miniscule tasks in comparison to the three semantic runs done on the code. I added speed counters into the glue code of GDC some time ago. http://iainbuclaw.wordpress.com/2010/09/18/implementing-speed-counters-i=


 And here is the relavent report to go with it.
 http://iainbuclaw.files.wordpress.com/2010/09/d2-time-report2.pdf


 Example: std/xml.d
 Module::parse : 0.01 ( 0%)
 Module::semantic : 0.50 ( 9%)
 Module::semantic2 : 0.02 ( 0%)
 Module::semantic3 : 0.04 ( 1%)
 Module::genobjfile : 0.10 ( 2%)

 For the entire time it took to compile the one file (5.22 seconds) -
 it spent almost 10% of it's time running the first semantic analysis.


 But that was the D2 frontend / phobos as of September 2010. =A0I should
 re-run a report on updated times and draw some comparisons. :~)

Is time spent on I/O accounted for in the parse step? And where is the re=

 spent :)

It would be, the counter starts before the files are even touched, and ends after they are closed. The rest of the time spent is in the GCC backend, going through the some 60+ code passes and outputting the assembly to file. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 13 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
The measurements should be done for modules being imported, not 
the module being compiled.
Something like this.
---
import std.algorithm;
import std.stdio;
import std.typecons;
import std.datetime;

int ok;
---
Jun 13 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not 
 the module being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di
Jun 13 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the module
 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~) -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 13 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Wednesday, 13 June 2012 at 11:47:31 UTC, Iain Buclaw wrote:
 std.datetime is one reason for me to run it again. I can 
 imagine that
 *that* module will have an impact on parse times.  But I'm still
 persistent that the majority of the compile time in the 
 frontend is
 done in the first semantic pass, and not the read/parser stage. 
 :~)

Probably. Also test with -fsyntax-only is it works and runs semantic passes.
Jun 13 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, June 14, 2012 10:03:05 Don Clugston wrote:
 On 13/06/12 16:29, Walter Bright wrote:
 On 6/13/2012 1:07 AM, Don Clugston wrote:
 On 12/06/12 18:46, Walter Bright wrote:
 On 6/12/2012 2:07 AM, timotheecour wrote:
 There's a current pull request to improve di file generation
 (https://github.com/D-Programming-Language/dmd/pull/945); I'd like to
 suggest
 further ideas.
 As far as I understand, di interface files try to achieve these
 conflicting goals:
 
 1) speed up compilation by avoiding having to reparse large files over
 and over.
 2) hide implementation details for proprietary reasons
 3) still maintain source code in some form to allow inlining and CTFE
 4) be human readable

(4) was not a goal. A .di file could very well be a binary file, but making it look like D source enabled them to be loaded with no additional implementation work in the compiler.

I don't understand (1) actually. For two reasons: (a) Is lexing + parsing really a significant part of the compilation time? Has anyone done some solid profiling?

It is for debug builds.

Iain's data indicates that it's only a few % of the time taken on semantic1(). Do you have data that shows otherwise? It seems to me, that slow parsing is a C++ problem which D already solved.

If this is the case, is there any value at all to using .di files in druntime or Phobos other than in cases where we're specifically trying to hide implementation (e.g. with the GC)? Or do we still end up paying the semantic cost for importing the .d files such that using .di files would still help with compilation times? - Jonathan M Davis
Jun 14 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 14 June 2012 at 08:11:02 UTC, Jonathan M Davis wrote:
 Or do we still end up paying the semantic
 cost for importing the .d files such that using .di files would 
 still help with
 compilation times?

Oh, right, the module can use mixins and CTFE, so it should be semantically checked, but the semantic check may be minimal just like in the case of a .di file.
Jun 14 2012
prev sibling next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, June 15, 2012 08:58:55 Don Clugston wrote:
 I don't think Phobos should use .di files at all. I don't think there
 are any cases where we want to conceal code.
 
 The performance benefit you would get is completely negligible. It
 doesn't even reduce the number of files that need to be loaded, just the
 length of each one.
 
 I think that, for example, improving the way that array literals are
 dealt with would have at least as much impact on compilation time.
 For the DMD backend, fixing up the treatment of comma expressions would
 have a much bigger impact than getting lexing and parsing time to zero.
 
 And we're well set up for parallel compilation. There's no shortage of
 things we can do to improve compilation time.
 
 Using di files for speed seems a bit like jettisoning the cargo to keep
 the ship afloat. It works but you only do it when you've got no other
 options.

On several occasions, Walter has expressed the desire to make Phobos use .di files like druntime does, otherwise I probably would never have considered it. Personally, I don't want to bother with it unless there's a large benefit from it, so if we're sure that the gain is minimal, then I say that we should just leave it all as .d files. Most of of Phobos would have to have its implementation left in any .di files anyway so that inlining and CTFE could work. - Jonathan M Davis
Jun 15 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 13 June 2012 12:47, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the mod=



 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. =A0But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~)

Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library. http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf Notes about it: - GCC has 4 new time counters - phase setup (time spent loading the compile time environment) - phase parsing (time spent in the frontend) - phase generate (time spent in the backend) - phase finalize (time spent cleaning up and exiting) - Of the phase parsing stage, it is broken down into 5 components - Module::parse - Module::semantic - Module::semantic2 - Module::semantic3 - Module::genobjfile - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-) I'll post a tl;dr later on it. --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 16 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 16 June 2012 10:18, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:47, Iain Buclaw <ibuclaw ubuntu.com> wrote:
 On 13 June 2012 12:33, Kagamin <spam here.lot> wrote:
 On Wednesday, 13 June 2012 at 11:29:45 UTC, Kagamin wrote:
 The measurements should be done for modules being imported, not the mo=




 being compiled.
 Something like this.
 ---
 import std.algorithm;
 import std.stdio;
 import std.typecons;
 import std.datetime;

 int ok;
 ---

Oh and let it import .d files, not .di

std.datetime is one reason for me to run it again. I can imagine that *that* module will have an impact on parse times. =A0But I'm still persistent that the majority of the compile time in the frontend is done in the first semantic pass, and not the read/parser stage. :~)

Rebuilt a compile log with latest gdc as of writing on the 2.059 frontend / library. http://iainbuclaw.files.wordpress.com/2012/06/d2time_report32_2059.pdf http://iainbuclaw.files.wordpress.com/2012/06/d2time_report64_2059.pdf Notes about it: - GCC has 4 new time counters =A0- =A0phase setup =A0(time spent loading the compile time environment) =A0- =A0phase parsing =A0(time spent in the frontend) =A0- =A0phase generate (time spent in the backend) =A0- =A0phase finalize =A0(time spent cleaning up and exiting) - Of the phase parsing stage, it is broken down into 5 components =A0- =A0Module::parse =A0- =A0Module::semantic =A0- =A0Module::semantic2 =A0- =A0Module::semantic3 =A0- =A0Module::genobjfile - Module::read, Module::parse and Module::importAll in the one I did 2 years ago are now counted as part of just the one parsing stage, rather than separate just to make it a little bit more balanced. :-) I'll post a tl;dr later on it.

tl;dr Total number of source files compiled: 207 Total time to build druntime and phobos: 78.08 seconds Time spent parsing: 17.15 seconds Average time spent parsing: 0.08 seconds Time spent running semantic passes: 10.04 seconds Time spent generating backend AST: 2.15 seconds Time spent in backend: 48.62 seconds So parsing time has taken quite a hit since I last did any reports on compilation speed of building phobos. I suspect most of that comes from the loading of symbols from all imports and that there have been some large additions to phobos recently which provide a constant bottle neck if one was to choose compiling one source at a time. As the apparent large amount of time spent parsing sources does not show when compiling all at once. Module::parse: 0.58 seconds (1%) Module::semantic: 0.24 seconds (1%) Module::semantic2: 0.01 seconds (0%) Module::semantic3: 2.85 seconds (6%) Module::genobjfile: 1.24 seconds ( 3%) TOTAL: 47.06 seconds Considering that the entire phobos library is some 165K lines of code, I don't see why people aren't laughing about just how quick the frontend is at parsing. :~) Regards --=20 Iain Buclaw *(p < e ? p++ : p) =3D (c & 0x0f) + '0';
Jun 16 2012
prev sibling next sibling parent "Daniel" <wyrlon gmx.net> writes:
On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

taken on semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.

Same here, I wish there were a standardized pre-lexed-token "binary" file-format, would benefit all text editors also, as they need to lex it anyway to perform color syntax highlighting.
Jun 18 2012
prev sibling next sibling parent "Chris Cain" <clcain uncg.edu> writes:
On Monday, 18 June 2012 at 18:05:59 UTC, Daniel wrote:
 Same here, I wish there were a standardized pre-lexed-token 
 "binary" file-format, would benefit all text editors also, as 
 they need to lex it anyway to perform color syntax highlighting.

If I were to make my own language, I'd forego a human-readable format and just have the "language" be defined as a big machine-readable AST. You'd have to have an IDE, but it could display the code in just about any way the person wants (syntax, style, etc). Syntax highlighting would be instantaneous and there would be fewer errors made by programmers (maybe ...). Plus it'd be unbelievably easy to implement things like auto-completion.
Jun 18 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Monday, 18 June 2012 at 17:54:40 UTC, Walter Bright wrote:
 Yeah, but I can't escape that lingering feeling that lexing is 
 slow.

 I was fairly disappointed that asynchronously reading the 
 source files didn't have a measurable effect most of the time.

I don't even understand all this rage about asynchronicity, if the program has nothing to do until it reads the data, asynchronicity won't help you in the slightest. Anyway everything is stuck while the device performs DMA.
Jun 19 2012
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Tuesday, 19 June 2012 at 01:47:27 UTC, Timon Gehr wrote:
 Parsing is not a huge issue. Depending on how powerful the 
 language is, auto-completion may depend on full code analysis.

Yep, pegged runs at compile time.
Jun 19 2012
prev sibling next sibling parent Iain Buclaw <ibuclaw ubuntu.com> writes:
On 16 June 2012 22:17, Guillaume Chatelet <chatelet.guillaume gmail.com> wrote:
 So parsing time has taken quite a hit since I last did any reports on
 compilation speed of building phobos.

So maybe my post about "keeping import clean" wasn't as irrelevant as I thought. http://www.digitalmars.com/d/archives/digitalmars/D/Keeping_imports_clean_162890.html#N162890 -- Guillaume

I think it's relevancy is only geared towards projects that are compiling one file at a time - ie: I'd expect all gdc users to be compiling in this way as whole program compilation using gdc still needs some rigourous testing first. If there is a particular large module, or set of large modules that are persistantly being importanted, then you will see a notable constant slowdown on compilation of each file. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';
Jun 19 2012
prev sibling next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 Jun 2012 13:53:43 -0400, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.

I have found that my project, which has a huge number of symbols (And large ones) compiles much slower than I would expect. Perhaps you have forgotten about this issue: http://d.puremagic.com/issues/show_bug.cgi?id=4900 Maybe fixing this still doesn't help parsing, not sure. -Steve
Jun 25 2012
prev sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
On Mon, 18 Jun 2012 19:53:43 +0200, Walter Bright  
<newshound2 digitalmars.com> wrote:

 On 6/18/2012 6:07 AM, Don Clugston wrote:
 On 17/06/12 00:37, Walter Bright wrote:
 On 6/14/2012 1:03 AM, Don Clugston wrote:
 It is for debug builds.

semantic1(). Do you have data that shows otherwise?

Nothing recent, it's mostly from my C++ compiler testing.

But you argued in your blog that C++ parsing is inherently slow, and you've fixed those problems in the design of D. And as far as I can tell, you were extremely successful! Parsing in D is very, very fast.

Yeah, but I can't escape that lingering feeling that lexing is slow. I was fairly disappointed that asynchronously reading the source files didn't have a measurable effect most of the time.

Lexing is definitely taking a big part of debug compilation time. I haven't profiled the compiler for some time now but here are some thoughts. - speeding up the identifier hash table there was always a profile spike at StringTable::lookup, though it reduced since you increased the bucket count - memory mapping the source file saves a copy for UTF-8 sources this is by far the fastest way to read a source file - parallel reading/parsing doesn't help much if most of the source files are read during import semantic I'm regularly hitting other bottle necks so I don't think that lexing is #1. When compiling std.range with unittests for example more that 50% of the compile time is spend to check for existing template instantiations using O(N^2)/2 compares of template arguments. If we managed to fix http://d.puremagic.com/issues/show_bug.cgi?id=7469 we could efficiently use the mangled name as key.
Jun 25 2012