www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [GSoC] RFC: Thrift project proposal (draft)

reply David Nadlinger <see klickverbot.at> writes:
Hi all,

I am putting together a Google Summer of Code project proposal regarding 
the Apache Thrift idea (see the ideas page[1]), which I intend to 
officially submit as soon as the application period opens. You can find 
my first draft at http://klickverbot.at/code/gsoc/thrift/.

While I would love to hear any opinions, two specific questions:

Walter, could you as the organization admin please have a look at this 
if it meets your formal expectations (the application template section 
of the Digital Mars GSoC profile is still empty)?

Andrei, as you are the one behind the original suggestion, would you 
mind having a quick glance at the proposal? Do you have any experience 
with Thrift in production use from your work at Facebook?

David


P.S.: I am notoriously bad at writing »About me« sections, but from
reading around a bit I figured a GSoC application should include one…



[1] http://www.prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas
Mar 24 2011
next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Thu, 24 Mar 2011 19:46:39 -0400, David Nadlinger <see klickverbot.at>  
wrote:

 Hi all,

 I am putting together a Google Summer of Code project proposal regarding  
 the Apache Thrift idea (see the ideas page[1]), which I intend to  
 officially submit as soon as the application period opens. You can find  
 my first draft at http://klickverbot.at/code/gsoc/thrift/.

 While I would love to hear any opinions, two specific questions:

 Walter, could you as the organization admin please have a look at this  
 if it meets your formal expectations (the application template section  
 of the Digital Mars GSoC profile is still empty)?

 Andrei, as you are the one behind the original suggestion, would you  
 mind having a quick glance at the proposal? Do you have any experience  
 with Thrift in production use from your work at Facebook?

 David


 P.S.: I am notoriously bad at writing »About me« sections, but from
 reading around a bit I figured a GSoC application should include one…



 [1] http://www.prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas

First and foremost, I would strongly recommend against looking at Thrifts internals; if you do, the project _should not_ be submitted to Phobos. (Thrift is Apache License 2.0 which isn't compatible with the Boost License). Alternatively, you could aim to get the library into etc.*, or simply make it a D source project. I do feel that aiming for Phobos would strengthen your application though. As for the project itself, I'd agree with you that due to certain, well-known CTFE bugs, you probably wouldn't be able to parse anything more than the simplest Thrift IDL at compile time today. But one of the major advantages of CTFE is that there is no difference between regular D functions and CTFE D, so you can develop a full Thrift IDL parser/code generator in D and then use it as part of a build to today and an input to a string mixin tomorrow. I think playing up D's strengths, and that you are coding with an eye to the future, would strengthen your application. Currently, your proposal sounds like a simple port of a C++ library to D. This maybe what you intend to do, but if so, you should clarify this in your proposal. Regarding your writing, it's fairly solid, though it feels a bit too familiar for a formal proposal of work. (Though this might just be my academic background talking.) Also, I noticed a tendency for in-lined footnotes, ala "besides further working out the details of the project,", or "I’d expect to further improve both the code generator and the binding code, along with the accompanying documentation.". I'd recommend focusing on the big things you want to do (like contacting the D and Thrift communities, working of documentation and unit tests, etc) and leave out the expected day-to-day stuff. (i.e. Put the big rocks in the jar first and leave the gravel, sand and water to later : andrew.goenardi.com/big-rocks-and-a-jar) While I don't have the time for a mentorship, I have been working on an update to std.json, std.variant/algebraic as well as my own binary serialization library, and am willing to share code and/or talk serialization/de-serialization design.
Mar 24 2011
parent reply David Nadlinger <see klickverbot.at> writes:
Hello Robert,

thank you for taking the time to read my proposal.

On 3/25/11 5:48 AM, Robert Jacques wrote:
 First and foremost, I would strongly recommend against looking at
 Thrifts internals; if you do, the project _should not_ be submitted to
 Phobos. (Thrift is Apache License 2.0 which isn't compatible with the
 Boost License). Alternatively, you could aim to get the library into
 etc.*, or simply make it a D source project. I do feel that aiming for
 Phobos would strengthen your application though.

This can certainly be discussed, but I don't think including this project into Phobos would be the best choice – at least as long as an external »interface compiler«, i.e. generator would be used –, but rather trying to make it a part of the official Thrift project. This is how Thrift support was done for other languages, and having the code generator implementation in another project than the library it targets seems not like a wise thing to do. Although I'm not a lawyer, I have been involved with D long enough to be aware of a large part of the issues which can originate from Phobos being Boost-licensed. If we decide that we want to have Thrift support in Phobos itself, it would, strictly speaking, become hairy with regards to IP anyway, because at least as far as I can see, some protocol details are in fact implementation-defined. Figuring these protocol details out from the code is just what I meant to do anyway, I'll clarify the draft with regard to this.
 […] But one of the major advantages of CTFE is that there is no difference
between regular D functions and CTFE D, so you can develop a full Thrift IDL
parser/code generator in D and then use it as part of a build to today and an
input to a string mixin tomorrow.I think playing up D's strengths, and that you
are coding with an eye to 

To be honest, I don't think this will be possible with D CTFE in thee near future until somebody steps forward and radically improves the current CTFE implementation (thinking of it, this might be a nice project for GSoC as well). To back my pessimism a bit: I was doing a simple CTFE implementation of Gaussian elimination some weeks ago. Coming up with a version DMD would accept for compile-time values took me something like ten minutes, complete with runtime unittests. However, it wasn't until I spent two more afternoons of debugging (and two new wrong-code Bugzilla entries) until the CTFE results would actually match the runtime values computed by the same piece of code. And that was for code specifically written to be CTFE-friendly. In my experience, trying to reuse non-trivial pieces of normal runtime code not written with CTFE in mind results in even more problems – for example, you can't even really use std.algorithm if you want your code to run under CTFE at the moment. These issues make me skeptical about whether taking a possible future CTFE implementation into account is worth the hassle, even more so given the scope of the project (the official Thrift parser is something like 3.5 kLOC, with another 4 kLOC for the actual C++ code generator).
 Regarding your writing, it's fairly solid, though it feels a bit too
 familiar for a formal proposal of work.

Yes, I am aware that it is written in a rather colloquial style, decidedly too colloquial if I were to apply e.g. for a research grant. But as this, as far as I know, is not even going to leave the D community, I was not at all sure about the right level of formality. Thanks for the suggestion, though, as I was planning to give it a stylistic overhaul before the official submission anyway.
 While I don't have the time for a mentorship, I have been working on an
 update to std.json, std.variant/algebraic as well as my own binary
 serialization library, and am willing to share code and/or talk
 serialization/de-serialization design.

Thank you for the offer, I'll certainly contact you if this project should be approved. Also, being able to build on a solid JSON library will probably also be helpful for this project, as Thrift includes a JSON-based protocol. David
Mar 25 2011
next sibling parent reply Don <nospam nospam.com> writes:
David Nadlinger wrote:
 Hello Robert,
 
 thank you for taking the time to read my proposal.
 
 On 3/25/11 5:48 AM, Robert Jacques wrote:
 First and foremost, I would strongly recommend against looking at
 Thrifts internals; if you do, the project _should not_ be submitted to
 Phobos. (Thrift is Apache License 2.0 which isn't compatible with the
 Boost License). Alternatively, you could aim to get the library into
 etc.*, or simply make it a D source project. I do feel that aiming for
 Phobos would strengthen your application though.

This can certainly be discussed, but I don't think including this project into Phobos would be the best choice – at least as long as an external »interface compiler«, i.e. generator would be used –, but rather trying to make it a part of the official Thrift project. This is how Thrift support was done for other languages, and having the code generator implementation in another project than the library it targets seems not like a wise thing to do. Although I'm not a lawyer, I have been involved with D long enough to be aware of a large part of the issues which can originate from Phobos being Boost-licensed. If we decide that we want to have Thrift support in Phobos itself, it would, strictly speaking, become hairy with regards to IP anyway, because at least as far as I can see, some protocol details are in fact implementation-defined. Figuring these protocol details out from the code is just what I meant to do anyway, I'll clarify the draft with regard to this.
 […] But one of the major advantages of CTFE is that there is no 
 difference between regular D functions and CTFE D, so you can develop 
 a full Thrift IDL parser/code generator in D and then use it as part 
 of a build to today and an input to a string mixin tomorrow.I think 
 playing up D's strengths, and that you are coding with an eye to 

To be honest, I don't think this will be possible with D CTFE in thee near future until somebody steps forward and radically improves the current CTFE implementation (thinking of it, this might be a nice project for GSoC as well).

I'm giving CTFE a *major* overhaul right now. I don't know if I'll be finished in time for the next compiler release, but definitely by the release after that. Most importantly, bug 1330, which is the root cause of almost all of the problems, will be fixed. I hope to move CTFE out the "experimental feature" category.
Mar 26 2011
next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 3/26/11 5:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

That's great news – do you plan to put your work in progress up at GitHub somewhere before the official release? I'm playing around with CTFE quite a bit at the moment and plan to have a stab at making the basic parts of std.algorithm CTFE-able soon (Steve, did you find time to look at the Appender issue yet?), so I'd be glad to test any improvements… David
Mar 26 2011
parent reply Don <nospam nospam.com> writes:
David Nadlinger wrote:
 On 3/26/11 5:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

That's great news – do you plan to put your work in progress up at GitHub somewhere before the official release?

Yes, definitely. All my fixes go into my fork of dmd on github. My CTFE work is progressing quite well. Simple test cases like the one in bug 1330 are working (and all the existing tests still pass, of course). It will be a while before I publish it to github, though -- the code is VERY untidy, and lots of stuff isn't implemented yet.
 I'm playing around with 
 CTFE quite a bit at the moment and plan to have a stab at making the 
 basic parts of std.algorithm CTFE-able soon (Steve, did you find time to 
 look at the Appender issue yet?), so I'd be glad to test any improvements…

My changes will make a *lot* more things work in CTFE. I recommend against spending much time making things CTFE-able right now.
Mar 26 2011
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/26/11 3:01 PM, Don wrote:
 David Nadlinger wrote:
 On 3/26/11 5:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

That's great news – do you plan to put your work in progress up at GitHub somewhere before the official release?

Yes, definitely. All my fixes go into my fork of dmd on github. My CTFE work is progressing quite well. Simple test cases like the one in bug 1330 are working (and all the existing tests still pass, of course). It will be a while before I publish it to github, though -- the code is VERY untidy, and lots of stuff isn't implemented yet.
 I'm playing around with CTFE quite a bit at the moment and plan to
 have a stab at making the basic parts of std.algorithm CTFE-able soon
 (Steve, did you find time to look at the Appender issue yet?), so I'd
 be glad to test any improvements…

My changes will make a *lot* more things work in CTFE. I recommend against spending much time making things CTFE-able right now.

This is absolutely awesome. Compile-time evaluation is a key strategic feature of D. Thank you! Two questions - do you plan to allow class object creation a la new Widget? Also, since the upcoming features will be in time for GSoC projects, could you write a brief documentation project describing the scope of your improvements? Thanks again, Andrei
Mar 26 2011
parent reply Don <nospam nospam.com> writes:
Andrei Alexandrescu wrote:
 On 3/26/11 3:01 PM, Don wrote:
 David Nadlinger wrote:
 On 3/26/11 5:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

That's great news – do you plan to put your work in progress up at GitHub somewhere before the official release?

Yes, definitely. All my fixes go into my fork of dmd on github. My CTFE work is progressing quite well. Simple test cases like the one in bug 1330 are working (and all the existing tests still pass, of course). It will be a while before I publish it to github, though -- the code is VERY untidy, and lots of stuff isn't implemented yet.
 I'm playing around with CTFE quite a bit at the moment and plan to
 have a stab at making the basic parts of std.algorithm CTFE-able soon
 (Steve, did you find time to look at the Appender issue yet?), so I'd
 be glad to test any improvements…

My changes will make a *lot* more things work in CTFE. I recommend against spending much time making things CTFE-able right now.

This is absolutely awesome. Compile-time evaluation is a key strategic feature of D. Thank you! Two questions - do you plan to allow class object creation a la new Widget?

Eventually. That requires some form of class literal to be created inside the compiler, so it's a bit more work.
 Also, since the upcoming features will be in time for GSoC 
 projects, could you write a brief documentation project describing the 
 scope of your improvements?

My plan at this stage is just to overhaul the existing functionality (so that everything that currently sort-of works or seems to work, actually DOES work).
Mar 26 2011
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/27/2011 12:25 AM, Don wrote:
 Andrei Alexandrescu wrote:
 On 3/26/11 3:01 PM, Don wrote:
 David Nadlinger wrote:
 On 3/26/11 5:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root
 cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

That's great news – do you plan to put your work in progress up at GitHub somewhere before the official release?

Yes, definitely. All my fixes go into my fork of dmd on github. My CTFE work is progressing quite well. Simple test cases like the one in bug 1330 are working (and all the existing tests still pass, of course). It will be a while before I publish it to github, though -- the code is VERY untidy, and lots of stuff isn't implemented yet.
 I'm playing around with CTFE quite a bit at the moment and plan to
 have a stab at making the basic parts of std.algorithm CTFE-able soon
 (Steve, did you find time to look at the Appender issue yet?), so I'd
 be glad to test any improvements…

My changes will make a *lot* more things work in CTFE. I recommend against spending much time making things CTFE-able right now.

This is absolutely awesome. Compile-time evaluation is a key strategic feature of D. Thank you! Two questions - do you plan to allow class object creation a la new Widget?

Eventually. That requires some form of class literal to be created inside the compiler, so it's a bit more work.

Sounds great. Most of the advanced CTFE applications that I'm thinking of involve referential data types. Right now only arrays offer that in CTFE space, which is quite limiting. Andrei
Mar 27 2011
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Don:

 I'm giving CTFE a *major* overhaul right now.

Thank you Don, you are doing a lot for the improvement of the D compiler :-)
 Most importantly, bug 1330, which is the root cause 
 of almost all of the problems, will be fixed. I hope to move CTFE out 
 the "experimental feature" category.

That seems the bigger source of problems for CT code. For me another CT thing I'd like improved is the printing (bug 3952). Bye, bearophile
Mar 26 2011
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?
Mar 26 2011
parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe. I'm changing CTFE to use in-place modification. This fixes all those issues. But this is obviously a fairly intense change, and will take quite a lot of time to iron out all the corner cases. So that's all I'm planning on doing right now. But once that's done, it will be straightforward to implement other reference types, such as classes and pointers (pointer arithmetic will be restricted to pointers which point to array members). Once classes are implemented, it's straightforward to do exceptions. So, pretty much everything. I've been planning on doing this for over a year, but while Walter was working on 64-bit, I felt that I was the only one working on the showstopper wrong-code bugs and regressions, so I put this important-but-not-urgent stuff aside.
Mar 26 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 3/26/2011 4:16 PM, Don wrote:
 dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe.

Right. IIUC there's also no way to free the memory from copies that are no longer referenced. I can see where this would leak memory like a sieve.
 I'm changing CTFE to use in-place modification. This fixes all those
 issues. But this is obviously a fairly intense change, and will take
 quite a lot of time to iron out all the corner cases. So that's all I'm
 planning on doing right now.

This is a _huge_ improvement, but does it address the issue of freeing memory or is that beyond the scope?
 But once that's done, it will be straightforward to implement other
 reference types, such as classes and pointers (pointer arithmetic will
 be restricted to pointers which point to array members). Once classes
 are implemented, it's straightforward to do exceptions. So, pretty much
 everything.

Excellent.
 I've been planning on doing this for over a year, but while Walter was
 working on 64-bit, I felt that I was the only one working on the
 showstopper wrong-code bugs and regressions, so I put this
 important-but-not-urgent stuff aside.

Agreed. I love the 64-bit support (I've been using it for real work and it's surprisingly solid) but the pace of fixing miscellaneous bugs was understandably glacial while it was being implemented.
Mar 26 2011
parent reply Don <nospam nospam.com> writes:
dsimcha wrote:
 On 3/26/2011 4:16 PM, Don wrote:
 dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe.

Right. IIUC there's also no way to free the memory from copies that are no longer referenced. I can see where this would leak memory like a sieve.

That's not the big problem, actually. The issue is that x[7]=6; duplicates x, even if x has 10K elements. Now consider: for(int i=0; i<x.length; ++i) x[i]=3; // creates 100M new elements!! Should create none, or 10K at most.
 I'm changing CTFE to use in-place modification. This fixes all those
 issues. But this is obviously a fairly intense change, and will take
 quite a lot of time to iron out all the corner cases. So that's all I'm
 planning on doing right now.

This is a _huge_ improvement, but does it address the issue of freeing memory or is that beyond the scope?

Outside the scope, but it will use an order of magnitude less memory in the first place, in the cases which are causing the biggest problems (such as the one I showed above).
 But once that's done, it will be straightforward to implement other
 reference types, such as classes and pointers (pointer arithmetic will
 be restricted to pointers which point to array members). Once classes
 are implemented, it's straightforward to do exceptions. So, pretty much
 everything.

Excellent.
 I've been planning on doing this for over a year, but while Walter was
 working on 64-bit, I felt that I was the only one working on the
 showstopper wrong-code bugs and regressions, so I put this
 important-but-not-urgent stuff aside.

Agreed. I love the 64-bit support (I've been using it for real work and it's surprisingly solid) but the pace of fixing miscellaneous bugs was understandably glacial while it was being implemented.

Mar 26 2011
parent Don <nospam nospam.com> writes:
Robert Jacques wrote:
 On Sun, 27 Mar 2011 08:36:39 -0400, spir <denis.spir gmail.com> wrote:
 
 On 03/26/2011 09:57 PM, Don wrote:
 The basic problem with the current implementation of CTFE is that it
 uses copy-on-write. This means that references (including dynamic
 arrays) don't work properly -- they just copy a snapshot of the thing
 they are referencing. This is bug 1330. It also means it burns up 
 memory
 like you wouldn't believe.

Right. IIUC there's also no way to free the memory from copies that are no longer referenced. I can see where this would leak memory like a sieve.

That's not the big problem, actually. The issue is that x[7]=6; duplicates x, even if x has 10K elements. Now consider: for(int i=0; i<x.length; ++i) x[i]=3; // creates 100M new elements!! Should create none, or 10K at most.

Hello Don, I don't understand your point. I have once implemented a toy dynamic language, using the common trick of boxed elements ( la Lisp). But I wanted to maintain value semantics as standard. A cheap way to do that is copy on write; it is actually cheap since simple, atomic, elements are never copied (since they cannot be changed on place), thus one just just needs to trace complex elements (array-lists & named tuples in my case): x := [1,2,3] // create the array value, assign its ref y := x // copy the ref, mark the value as shared x[1] := 0 // copy the value, reassign the ref, then change But the new value is not shared, thus: x[1] := 1 // change only So that in your loop example, at most one array copy happens (iff it was shared). This is as far as I know what is commonly called copy-on-write. There is no need to copy the value over and over again on every change if it is not multiple-referenced, and noone does that, I guess. Side-Note: assignments of the form of "y := x" are really special, at least conceptually; but also practically when pointers or refs enter the game. I call them "symbol assignments" as the source is a symbol. Denis

Hi Denis, What Don is explaining is not how you should implement copy-on-write, etc., but the actual implementation of arrays in DMD's CTFE system.

 Right now, any access to an array in CTFE causes the entire array to be 
 duplicated, which is a major memory and performance issue, to say 
 nothing of the fact that D arrays are supposed to have reference, not 
 value semantics. I don't know how or why this behavior was ever 
 introduced, only that it is awesome that Don is fixing it.

I believe it was a quick hack to get things working. But it needs to disappear.
Mar 27 2011
prev sibling next sibling parent reply David Nadlinger <see klickverbot.at> writes:
On 3/26/11 9:16 PM, Don wrote:
 The basic problem with the current implementation of CTFE is that it
 uses copy-on-write. This means that references (including dynamic
 arrays) don't work properly -- they just copy a snapshot of the thing
 they are referencing. This is bug 1330. It also means it burns up memory
 like you wouldn't believe.

 I'm changing CTFE to use in-place modification. This fixes all those
 issues. But this is obviously a fairly intense change, and will take
 quite a lot of time to iron out all the corner cases. So that's all I'm
 planning on doing right now.

 But once that's done, it will be straightforward to implement other
 reference types, such as classes and pointers (pointer arithmetic will
 be restricted to pointers which point to array members). Once classes
 are implemented, it's straightforward to do exceptions. So, pretty much
 everything.

First of all, let me say again that I am really looking forward to your changes, as I even considered having a go at solving the referencing issue myself for a while (but without sound knowledge of the compiler internals, this is an even harder thing to pull off). Do I understand correctly that your changes wouldn't introduce some form of real compile-time memory management, but alleviate the need for it by fixing bug 1330 and related ones, thus cutting down on the ridiculous amount of copying going on today? And finally – I know such questions are tough to answer –, do you have a rough estimate on how long it will take you to get the basic set of changes ready for testing? This is somewhat relevant for me, as the coding period for GSoC is going to start in about two months from now, and I think that with current DMD, doing the Thrift compiler in CTFE might be infeasible due to memory usage. Thanks a lot for your work, David
Mar 26 2011
parent David Nadlinger <see klickverbot.at> writes:
On 3/26/11 9:59 PM, David Nadlinger wrote:
 On 3/26/11 9:16 PM, Don wrote:
 The basic problem with the current implementation of CTFE is that it
 uses copy-on-write. This means that references (including dynamic
 arrays) don't work properly -- they just copy a snapshot of the thing
 they are referencing. This is bug 1330. It also means it burns up memory
 like you wouldn't believe.

 I'm changing CTFE to use in-place modification. This fixes all those
 issues. But this is obviously a fairly intense change, and will take
 quite a lot of time to iron out all the corner cases. So that's all I'm
 planning on doing right now.

 But once that's done, it will be straightforward to implement other
 reference types, such as classes and pointers (pointer arithmetic will
 be restricted to pointers which point to array members). Once classes
 are implemented, it's straightforward to do exceptions. So, pretty much
 everything.

[…] Do I understand correctly that your changes wouldn't introduce some form of real compile-time memory management, but alleviate the need for it by fixing bug 1330 and related ones, thus cutting down on the ridiculous amount of copying going on today?

Ah, forget that part, I wasn't aware of David's post asking the same question (and your answer to it) when I wrote this message. David
Mar 26 2011
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-03-26 21:16, Don wrote:
 dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe. I'm changing CTFE to use in-place modification. This fixes all those issues. But this is obviously a fairly intense change, and will take quite a lot of time to iron out all the corner cases. So that's all I'm planning on doing right now. But once that's done, it will be straightforward to implement other reference types, such as classes and pointers (pointer arithmetic will be restricted to pointers which point to array members). Once classes are implemented, it's straightforward to do exceptions. So, pretty much everything. I've been planning on doing this for over a year, but while Walter was working on 64-bit, I felt that I was the only one working on the showstopper wrong-code bugs and regressions, so I put this important-but-not-urgent stuff aside.

Will the time it takes to compile heavy uses of CTFE be affected by this (positive or negative)? -- /Jacob Carlborg
Mar 27 2011
parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On 2011-03-27 08:41, Robert Jacques wrote:
 On Sun, 27 Mar 2011 06:06:48 -0400, Jacob Carlborg <doob me.com> wrote:
 On 2011-03-26 21:16, Don wrote:
 dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root
 cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe. I'm changing CTFE to use in-place modification. This fixes all those issues. But this is obviously a fairly intense change, and will take quite a lot of time to iron out all the corner cases. So that's all I'm planning on doing right now. But once that's done, it will be straightforward to implement other reference types, such as classes and pointers (pointer arithmetic will be restricted to pointers which point to array members). Once classes are implemented, it's straightforward to do exceptions. So, pretty much everything. I've been planning on doing this for over a year, but while Walter was working on 64-bit, I felt that I was the only one working on the showstopper wrong-code bugs and regressions, so I put this important-but-not-urgent stuff aside.

Will the time it takes to compile heavy uses of CTFE be affected by this (positive or negative)?

Any string heavy CTFE should see a major improvement in performance.

Yeah. Considering how memory-heavy CTFE tends to be, I'd expect that such a massive drop in memory consumption would almost always result in a performance improvement. However, we could end up being surprised with how it actually performs, since how the performance characteristics of an application change as you change it can sometimes be very surprising. I would generally expect it to improve performance though, not harm it. And it should definitely make some CTFE which currently fails due to a lack of memory actually work. - Jonathan M Davis
Mar 27 2011
prev sibling parent spir <denis.spir gmail.com> writes:
On 03/26/2011 09:57 PM, Don wrote:
 The basic problem with the current implementation of CTFE is that it
 uses copy-on-write. This means that references (including dynamic
 arrays) don't work properly -- they just copy a snapshot of the thing
 they are referencing. This is bug 1330. It also means it burns up memory
 like you wouldn't believe.

Right. IIUC there's also no way to free the memory from copies that are no longer referenced. I can see where this would leak memory like a sieve.

That's not the big problem, actually. The issue is that x[7]=6; duplicates x, even if x has 10K elements. Now consider: for(int i=0; i<x.length; ++i) x[i]=3; // creates 100M new elements!! Should create none, or 10K at most.

Hello Don, I don't understand your point. I have once implemented a toy dynamic language, using the common trick of boxed elements (à la Lisp). But I wanted to maintain value semantics as standard. A cheap way to do that is copy on write; it is actually cheap since simple, atomic, elements are never copied (since they cannot be changed on place), thus one just just needs to trace complex elements (array-lists & named tuples in my case): x := [1,2,3] // create the array value, assign its ref y := x // copy the ref, mark the value as shared x[1] := 0 // copy the value, reassign the ref, then change But the new value is not shared, thus: x[1] := 1 // change only So that in your loop example, at most one array copy happens (iff it was shared). This is as far as I know what is commonly called copy-on-write. There is no need to copy the value over and over again on every change if it is not multiple-referenced, and noone does that, I guess. Side-Note: assignments of the form of "y := x" are really special, at least conceptually; but also practically when pointers or refs enter the game. I call them "symbol assignments" as the source is a symbol. Denis -- _________________ vita es estrany spir.wikidot.com
Mar 27 2011
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2011-03-25 00:46, David Nadlinger wrote:
 Hi all,

 I am putting together a Google Summer of Code project proposal regarding
 the Apache Thrift idea (see the ideas page[1]), which I intend to
 officially submit as soon as the application period opens. You can find
 my first draft at http://klickverbot.at/code/gsoc/thrift/.

 While I would love to hear any opinions, two specific questions:

 Walter, could you as the organization admin please have a look at this
 if it meets your formal expectations (the application template section
 of the Digital Mars GSoC profile is still empty)?

 Andrei, as you are the one behind the original suggestion, would you
 mind having a quick glance at the proposal? Do you have any experience
 with Thrift in production use from your work at Facebook?

 David


 P.S.: I am notoriously bad at writing »About me« sections, but from
 reading around a bit I figured a GSoC application should include one…



 [1] http://www.prowiki.org/wiki4d/wiki.cgi?GSOC_2011_Ideas

Don't know if this will be any problem with the Thrift protocol, specially since C++ is supported, but D has very limited runtime reflection support making it unnecessary hard to implement serialization. -- /Jacob Carlborg
Mar 25 2011
parent reply David Nadlinger <see klickverbot.at> writes:
On 3/25/11 3:04 PM, Jacob Carlborg wrote:
 Don't know if this will be any problem with the Thrift protocol,
 specially since C++ is supported, but D has very limited runtime
 reflection support making it unnecessary hard to implement serialization.

Thrift and other, similar projects (like Google's Protocol Buffers) go the other way round anyway – you first define the data formats and RPC interfaces, and then use code generated from the definition to work with them in your application. David
Mar 25 2011
parent Jacob Carlborg <doob me.com> writes:
On 2011-03-25 15:28, David Nadlinger wrote:
 On 3/25/11 3:04 PM, Jacob Carlborg wrote:
 Don't know if this will be any problem with the Thrift protocol,
 specially since C++ is supported, but D has very limited runtime
 reflection support making it unnecessary hard to implement serialization.

Thrift and other, similar projects (like Google's Protocol Buffers) go the other way round anyway – you first define the data formats and RPC interfaces, and then use code generated from the definition to work with them in your application. David

Ok, I see. -- /Jacob Carlborg
Mar 25 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sat, 26 Mar 2011 16:57:34 -0400, Don <nospam nospam.com> wrote:
 dsimcha wrote:
 On 3/26/2011 4:16 PM, Don wrote:
 I'm changing CTFE to use in-place modification. This fixes all those
 issues. But this is obviously a fairly intense change, and will take
 quite a lot of time to iron out all the corner cases. So that's all I'm
 planning on doing right now.

memory or is that beyond the scope?

Outside the scope, but it will use an order of magnitude less memory in the first place, in the cases which are causing the biggest problems (such as the one I showed above).

How hard would it be for the compiler to allocate all the memory for a CTFE evaluation on a second heap, dup the final output and then trash the entire heap? Or is that how CTFE already works? Also, thanks a bunch for working on this bug.
Mar 26 2011
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/24/11 4:46 PM, David Nadlinger wrote:
 Hi all,

 I am putting together a Google Summer of Code project proposal regarding
 the Apache Thrift idea (see the ideas page[1]), which I intend to
 officially submit as soon as the application period opens. You can find
 my first draft at http://klickverbot.at/code/gsoc/thrift/.

 While I would love to hear any opinions, two specific questions:

 Walter, could you as the organization admin please have a look at this
 if it meets your formal expectations (the application template section
 of the Digital Mars GSoC profile is still empty)?

 Andrei, as you are the one behind the original suggestion, would you
 mind having a quick glance at the proposal? Do you have any experience
 with Thrift in production use from your work at Facebook?

This is a strong proposal that I will back up. I have shared it inside Facebook and two fellow engineers offered to help with Thrift-related questions you might have. Andrei
Mar 26 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 27 Mar 2011 08:36:39 -0400, spir <denis.spir gmail.com> wrote:

 On 03/26/2011 09:57 PM, Don wrote:
 The basic problem with the current implementation of CTFE is that it
 uses copy-on-write. This means that references (including dynamic
 arrays) don't work properly -- they just copy a snapshot of the thing
 they are referencing. This is bug 1330. It also means it burns up  
 memory
 like you wouldn't believe.

Right. IIUC there's also no way to free the memory from copies that are no longer referenced. I can see where this would leak memory like a sieve.

That's not the big problem, actually. The issue is that x[7]=6; duplicates x, even if x has 10K elements. Now consider: for(int i=0; i<x.length; ++i) x[i]=3; // creates 100M new elements!! Should create none, or 10K at most.

Hello Don, I don't understand your point. I have once implemented a toy dynamic language, using the common trick of boxed elements ( la Lisp). But I wanted to maintain value semantics as standard. A cheap way to do that is copy on write; it is actually cheap since simple, atomic, elements are never copied (since they cannot be changed on place), thus one just just needs to trace complex elements (array-lists & named tuples in my case): x := [1,2,3] // create the array value, assign its ref y := x // copy the ref, mark the value as shared x[1] := 0 // copy the value, reassign the ref, then change But the new value is not shared, thus: x[1] := 1 // change only So that in your loop example, at most one array copy happens (iff it was shared). This is as far as I know what is commonly called copy-on-write. There is no need to copy the value over and over again on every change if it is not multiple-referenced, and noone does that, I guess. Side-Note: assignments of the form of "y := x" are really special, at least conceptually; but also practically when pointers or refs enter the game. I call them "symbol assignments" as the source is a symbol. Denis

Hi Denis, What Don is explaining is not how you should implement copy-on-write, etc., but the actual implementation of arrays in DMD's CTFE system. Right now, any access to an array in CTFE causes the entire array to be duplicated, which is a major memory and performance issue, to say nothing of the fact that D arrays are supposed to have reference, not value semantics. I don't know how or why this behavior was ever introduced, only that it is awesome that Don is fixing it.
Mar 27 2011
prev sibling next sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Sun, 27 Mar 2011 06:06:48 -0400, Jacob Carlborg <doob me.com> wrote:

 On 2011-03-26 21:16, Don wrote:
 dsimcha wrote:
 On 3/26/2011 12:16 PM, Don wrote:
 I'm giving CTFE a *major* overhaul right now. I don't know if I'll be
 finished in time for the next compiler release, but definitely by the
 release after that. Most importantly, bug 1330, which is the root  
 cause
 of almost all of the problems, will be fixed. I hope to move CTFE out
 the "experimental feature" category.

This is great news, I'm looking forward to it. Thanks for the hard work. Out of curiosity, can you give a brief overview of what new things CTFE will be usable for?

The basic problem with the current implementation of CTFE is that it uses copy-on-write. This means that references (including dynamic arrays) don't work properly -- they just copy a snapshot of the thing they are referencing. This is bug 1330. It also means it burns up memory like you wouldn't believe. I'm changing CTFE to use in-place modification. This fixes all those issues. But this is obviously a fairly intense change, and will take quite a lot of time to iron out all the corner cases. So that's all I'm planning on doing right now. But once that's done, it will be straightforward to implement other reference types, such as classes and pointers (pointer arithmetic will be restricted to pointers which point to array members). Once classes are implemented, it's straightforward to do exceptions. So, pretty much everything. I've been planning on doing this for over a year, but while Walter was working on 64-bit, I felt that I was the only one working on the showstopper wrong-code bugs and regressions, so I put this important-but-not-urgent stuff aside.

Will the time it takes to compile heavy uses of CTFE be affected by this (positive or negative)?

Any string heavy CTFE should see a major improvement in performance.
Mar 27 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
I remember a few months ago I've tried using CTFE and import
expressions to load a .def file and generate at compile-time a runtime
DLL loading mechanism in a class which would load a DLL file and
create wrapper functions for DLL functions. It would also add
try{}catch{} blocks based on a naming scheme and if -debug was
enabled. But I've had some big issues with string handling at
compile-time.

I'm not sure if I was doing something wrong or if CTFE was just
inadequate at the time (I do remember having some trouble using
foreach loops and some unfriendly CTFE error messages). I'll give it
another shoot soon. Anyhow, its great seeing someone working to
improve CTFE. Thanks, Don!
Mar 27 2011
prev sibling next sibling parent David Nadlinger <see klickverbot.at> writes:
I just revised the proposal and submitted it via Google's official 
interface, so don't be confused if you can't find it on my website any 
longer.

David
Mar 28 2011
prev sibling next sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
On 3/27/11, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:
 I remember a few months ago I've tried using CTFE and import
 expressions to load a .def file and generate at compile-time a runtime
 DLL loading mechanism in a class which would load a DLL file and
 create wrapper functions for DLL functions.

Found it. It doesn't actually load a .def file, and it wouldn't make much sense since a def file doesn't have much except a list of symbol names. It generates code that links function pointers to a DLL at runtime, and creates wrapper functions which take care of calling the C code. First I'd create a struct with a list of function prototypes. Then I'd just mixin() a string inside a class. The function that creates the string to be mixed in first checks the return values of the function prototypes, and based on that it can add code that throws on invalid values. Here's an example of a generated class at compile-time: https://gist.github.com/892698 or if that doesn't display right: http://dl.dropbox.com/u/9218759/result.d Of course much more could be done here. The generated functions could take strings instead of char pointers and call toStringz on them when calling a function pointer. And we could use ref instead of pointers for other parameters.
Mar 29 2011
prev sibling parent Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Ok this is the thing that really gets me with CTFE:

void printFields(T)(T t)
{
    enum fields = [__traits(allMembers, T)];

    foreach (string field; fields)
    {
        mixin("writeln(t." ~ to!string(field) ~ ");");      // fail
        mixin("writeln(t." ~ to!string(fields[0]) ~ ");");  // ok
    }
}

Even though the foreach loop will work, `field` can't be accessed.

Once we have that working, its heaven.
Mar 31 2011