www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - DDMD as showcase?

reply Xavier Bigand <flamaros.xavier gmail.com> writes:
Firstly is there some progress on the DDMD project or maybe an other D 
boostrap?

There is a lot of thread and debate around the GC of D, I interrogate me 
on the capacity of a project like DDMD to prove to every D users that 
the GC is perfectly usable for system applications. If not it will 
certainly be improve during the DDMD development to a point can satisfy 
almost everybody?

As I know DMD doesn't release memory, these will have to be fixed in 
DDMD to match the memory management most application have to apply.

I see many interesting points in DDMD :
  - prove GC based compiler isn't longer to optimize than the C++ 
version (and capable to reach same performances)
  - will reveal more language issues or phobos miss.
  - will be easier to maintain and update
Feb 10 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Xavier Bigand"  wrote in message news:ldbpum$1pov$1 digitalmars.com...

 Firstly is there some progress on the DDMD project or maybe an other D 
 boostrap?

The old ddmd project is pretty much dead AFAIK (http://www.dsource.org/projects/ddmd) and was never up to date with the current compiler. For about a year I've been working on automatically converting the compiler source from C++ to D. The conversion has produced a working compiler on win32/linux32/linux64 (other platforms need trivial patches) that compiles druntime/phobos/the test suite without errors. The current effort is around cleaning up the C++ source to produce higher quality D code. The next major step is to actually switch development to the D version. Outstanding patches: https://github.com/D-Programming-Language/dmd/pull/1980 Conversion tool: https://github.com/yebblies/magicport2 You can see some of the recent patches (marked DDMD) here: https://github.com/yebblies?tab=contributions&period=monthly
 There is a lot of thread and debate around the GC of D, I interrogate me 
 on the capacity of a project like DDMD to prove to every D users that the 
 GC is perfectly usable for system applications. If not it will certainly 
 be improve during the DDMD development to a point can satisfy almost 
 everybody?

There are no planned GC modifications for the DDMD project. It may result in some compiler devs taking more of an interest in the GC.
 As I know DMD doesn't release memory, these will have to be fixed in DDMD 
 to match the memory management most application have to apply.

Yes, DDMD will use the GC, although it currently has it disabled due to a segfault I haven't tracked down yet.
 I see many interesting points in DDMD :
   - prove GC based compiler isn't longer to optimize than the C++ version 
 (and capable to reach same performances)
   - will reveal more language issues or phobos miss.

Maybe, but so far additions to the language have been minimal, and DDMD does not currently use phobos. It is also slower than the C++ version, part of which is due to the GC being slower than the bump-pointer allocator used in the C++ dmd.
   - will be easier to maintain and update

Exactly!
Feb 10 2014
next sibling parent reply "Adam Wilson" <flyboynw gmail.com> writes:
On Mon, 10 Feb 2014 20:19:22 -0800, Daniel Murphy  
<yebbliesnospam gmail.com> wrote:

 "Xavier Bigand"  wrote in message news:ldbpum$1pov$1 digitalmars.com...

 Firstly is there some progress on the DDMD project or maybe an other D  
 boostrap?

The old ddmd project is pretty much dead AFAIK (http://www.dsource.org/projects/ddmd) and was never up to date with the current compiler. For about a year I've been working on automatically converting the compiler source from C++ to D. The conversion has produced a working compiler on win32/linux32/linux64 (other platforms need trivial patches) that compiles druntime/phobos/the test suite without errors. The current effort is around cleaning up the C++ source to produce higher quality D code. The next major step is to actually switch development to the D version. Outstanding patches: https://github.com/D-Programming-Language/dmd/pull/1980 Conversion tool: https://github.com/yebblies/magicport2 You can see some of the recent patches (marked DDMD) here: https://github.com/yebblies?tab=contributions&period=monthly
 There is a lot of thread and debate around the GC of D, I interrogate  
 me on the capacity of a project like DDMD to prove to every D users  
 that the GC is perfectly usable for system applications. If not it will  
 certainly be improve during the DDMD development to a point can satisfy  
 almost everybody?

There are no planned GC modifications for the DDMD project. It may result in some compiler devs taking more of an interest in the GC.

The GC itself is an orthogonal issue to the compiler. The way I see it, once the compiler can output precise information about the heap, stack, and registers, you can build any GC you want without the compiler requiring any knowledge of the GC.
 As I know DMD doesn't release memory, these will have to be fixed in  
 DDMD to match the memory management most application have to apply.

Yes, DDMD will use the GC, although it currently has it disabled due to a segfault I haven't tracked down yet.
 I see many interesting points in DDMD :
   - prove GC based compiler isn't longer to optimize than the C++  
 version (and capable to reach same performances)
   - will reveal more language issues or phobos miss.

Maybe, but so far additions to the language have been minimal, and DDMD does not currently use phobos. It is also slower than the C++ version, part of which is due to the GC being slower than the bump-pointer allocator used in the C++ dmd.
   - will be easier to maintain and update

Exactly!

-- Adam Wilson GitHub/IRC: LightBender Aurora Project Coordinator
Feb 10 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/11/14, 6:32 AM, Jakob Ovrum wrote:
 On Tuesday, 11 February 2014 at 10:29:58 UTC, thedeemon wrote:
 On Tuesday, 11 February 2014 at 04:36:28 UTC, Adam Wilson wrote:

 The GC itself is an orthogonal issue to the compiler. The way I see
 it, once the compiler can output precise information about the heap,
 stack, and registers, you can build any GC you want without the
 compiler requiring any knowledge of the GC.

If you want a fast GC it needs to be generational, i.e. most of the times scan just a portion of heap where young objects live (because most objects die young), not scan whole heap each time (as in current D GC). However in a mutable language that young/old generation split usually requires write barriers: compiler must emit code differently: each time a pointer field of a heap object is mutated it must check whether it's a link from old gen to young gen and remember that link (or just mark the page for scanning). So to have a generational GC in a mutable language you need to change the codegen as well. At least this is how most mature GCs work.

D code has different allocation patterns from Java and C#. In idiomatic D, young GC-allocated objects are probably much fewer.

I agree this is a good hypothesis (without having measured). My suspicion is a good GC for D is different from a good GC for Java or C#. Andrei
Feb 11 2014
parent Paulo Pinto <pjmlp progtools.org> writes:
Am 11.02.2014 20:58, schrieb Adam Wilson:
 On Tue, 11 Feb 2014 08:33:59 -0800, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 2/11/14, 6:32 AM, Jakob Ovrum wrote:
 On Tuesday, 11 February 2014 at 10:29:58 UTC, thedeemon wrote:
 On Tuesday, 11 February 2014 at 04:36:28 UTC, Adam Wilson wrote:

 The GC itself is an orthogonal issue to the compiler. The way I see
 it, once the compiler can output precise information about the heap,
 stack, and registers, you can build any GC you want without the
 compiler requiring any knowledge of the GC.

If you want a fast GC it needs to be generational, i.e. most of the times scan just a portion of heap where young objects live (because most objects die young), not scan whole heap each time (as in current D GC). However in a mutable language that young/old generation split usually requires write barriers: compiler must emit code differently: each time a pointer field of a heap object is mutated it must check whether it's a link from old gen to young gen and remember that link (or just mark the page for scanning). So to have a generational GC in a mutable language you need to change the codegen as well. At least this is how most mature GCs work.

D code has different allocation patterns from Java and C#. In idiomatic D, young GC-allocated objects are probably much fewer.

I agree this is a good hypothesis (without having measured). My suspicion is a good GC for D is different from a good GC for Java or C#. Andrei

I'm not so sure about that. That might be true for Java but C# is a stack based language with value types. However, I think that, as with C#, we often forget about the temporaries we implicitly allocate. Strings, Arrays, closures (and lambdas in C#),the ~= operator, etc. These highly ephemeral semantics are also quite common in C#. I imagine that this makes D's allocation patterns much closer to C# than Java. I was thinking about this last night and as I continue reading the GC Handbook, I think I understand more about why MS did what they did with the .NET GC. First of all, they used every algorithm in that book in one way or another. For example, the Large Object Heap is a simple Mark-Sweep because there tend to be relatively few nodes to check and fragmentation is much lower than the ephemeral generations, however, they enabled opt-in compaction in the latest release because the large size of each node meant that fragmentation became a problem quicker in long-running processes. Also the more I dive into it, the more I think that thread-local GC is a bad idea. As I understand it the point is to reduce the overall pause on any one thread by reducing the scope of the heap to collect. However, I would argue that the common case is that a program has a few threads that dominate the majority of running time, with many ephemeral threads are created for quick work (an incoming message over a socket for example). In this case your main threads are still going to have large heaps for the dominate threads and most likely heaps that are never collected on the ephemeral threads. This means that a few threads will still have noticeable pause times, and we've significantly increased compiler complexity to support thread local GC on all threads, and probably hammered thread start up time to do it. I could go on, but my point is that at the end of the day if you want performant collections, you end up using every trick in the book. The mixture may be slightly different, but I would suggest that the mixture is going to be slightly different based on the type of app even using the same language, which is why .NET provides two modes for the collector, Server and Workstation, and Java has four. So saying that D's collector will be different is naturally obvious, but I don't think it will be significantly different than C# as implied. We still have roughly similar allocation patterns, with roughly similar use cases, and will most likely end up building in every algorithm available and then tuning it the mixture of those algorithms to meet D's needs.

Another thing that you didn't mention is that most GC based environments provide tooling for GC tuning, which as far as I am aware, still don't exist for D. Java, .NET and Haskell ones are quite good. -- Paulo
Feb 11 2014
prev sibling next sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Tuesday, 11 February 2014 at 04:36:28 UTC, Adam Wilson wrote:

 The GC itself is an orthogonal issue to the compiler. The way I 
 see it, once the compiler can output precise information about 
 the heap, stack, and registers, you can build any GC you want 
 without the compiler requiring any knowledge of the GC.

If you want a fast GC it needs to be generational, i.e. most of the times scan just a portion of heap where young objects live (because most objects die young), not scan whole heap each time (as in current D GC). However in a mutable language that young/old generation split usually requires write barriers: compiler must emit code differently: each time a pointer field of a heap object is mutated it must check whether it's a link from old gen to young gen and remember that link (or just mark the page for scanning). So to have a generational GC in a mutable language you need to change the codegen as well. At least this is how most mature GCs work.
Feb 11 2014
prev sibling next sibling parent "Flamaros" <flamaros.xavier gmail.com> writes:
On Tuesday, 11 February 2014 at 04:19:12 UTC, Daniel Murphy wrote:
 "Xavier Bigand"  wrote in message 
 news:ldbpum$1pov$1 digitalmars.com...

 Firstly is there some progress on the DDMD project or maybe an 
 other D boostrap?

The old ddmd project is pretty much dead AFAIK (http://www.dsource.org/projects/ddmd) and was never up to date with the current compiler. For about a year I've been working on automatically converting the compiler source from C++ to D. The conversion has produced a working compiler on win32/linux32/linux64 (other platforms need trivial patches) that compiles druntime/phobos/the test suite without errors. The current effort is around cleaning up the C++ source to produce higher quality D code. The next major step is to actually switch development to the D version. Outstanding patches: https://github.com/D-Programming-Language/dmd/pull/1980 Conversion tool: https://github.com/yebblies/magicport2 You can see some of the recent patches (marked DDMD) here: https://github.com/yebblies?tab=contributions&period=monthly
 There is a lot of thread and debate around the GC of D, I 
 interrogate me on the capacity of a project like DDMD to prove 
 to every D users that the GC is perfectly usable for system 
 applications. If not it will certainly be improve during the 
 DDMD development to a point can satisfy almost everybody?

There are no planned GC modifications for the DDMD project. It may result in some compiler devs taking more of an interest in the GC.

That the idea. In all cases we need some proof of the validity of having a GC in a system language.
 As I know DMD doesn't release memory, these will have to be 
 fixed in DDMD to match the memory management most application 
 have to apply.

Yes, DDMD will use the GC, although it currently has it disabled due to a segfault I haven't tracked down yet.
 I see many interesting points in DDMD :
  - prove GC based compiler isn't longer to optimize than the 
 C++ version (and capable to reach same performances)
  - will reveal more language issues or phobos miss.

Maybe, but so far additions to the language have been minimal, and DDMD does not currently use phobos. It is also slower than the C++ version, part of which is due to the GC being slower than the bump-pointer allocator used in the C++ dmd.
  - will be easier to maintain and update

Exactly!

Feb 11 2014
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Tuesday, 11 February 2014 at 12:07:35 UTC, Flamaros wrote:
 On Tuesday, 11 February 2014 at 04:19:12 UTC, Daniel Murphy 
 wrote:
 "Xavier Bigand"  wrote in message 
 news:ldbpum$1pov$1 digitalmars.com...

 Firstly is there some progress on the DDMD project or maybe 
 an other D boostrap?

The old ddmd project is pretty much dead AFAIK (http://www.dsource.org/projects/ddmd) and was never up to date with the current compiler. For about a year I've been working on automatically converting the compiler source from C++ to D. The conversion has produced a working compiler on win32/linux32/linux64 (other platforms need trivial patches) that compiles druntime/phobos/the test suite without errors. The current effort is around cleaning up the C++ source to produce higher quality D code. The next major step is to actually switch development to the D version. Outstanding patches: https://github.com/D-Programming-Language/dmd/pull/1980 Conversion tool: https://github.com/yebblies/magicport2 You can see some of the recent patches (marked DDMD) here: https://github.com/yebblies?tab=contributions&period=monthly
 There is a lot of thread and debate around the GC of D, I 
 interrogate me on the capacity of a project like DDMD to 
 prove to every D users that the GC is perfectly usable for 
 system applications. If not it will certainly be improve 
 during the DDMD development to a point can satisfy almost 
 everybody?

There are no planned GC modifications for the DDMD project. It may result in some compiler devs taking more of an interest in the GC.

That the idea. In all cases we need some proof of the validity of having a GC in a system language.

It has already been proven by Oberon, Modula-3 and Sing# among others. http://www.inf.ethz.ch/personal/wirth/ProjectOberon/ http://cseweb.ucsd.edu/~savage/papers/Wcsss96m3os.pdf http://research.microsoft.com/en-us/projects/Singularity/ The Oberon one, as even used by ETHZ employees as workstations in mid-90's. That no OS vendor pushed a mainstream OS with them is another matter. -- Paulo
Feb 11 2014
prev sibling next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Tuesday, 11 February 2014 at 10:29:58 UTC, thedeemon wrote:
 On Tuesday, 11 February 2014 at 04:36:28 UTC, Adam Wilson wrote:

 The GC itself is an orthogonal issue to the compiler. The way 
 I see it, once the compiler can output precise information 
 about the heap, stack, and registers, you can build any GC you 
 want without the compiler requiring any knowledge of the GC.

If you want a fast GC it needs to be generational, i.e. most of the times scan just a portion of heap where young objects live (because most objects die young), not scan whole heap each time (as in current D GC). However in a mutable language that young/old generation split usually requires write barriers: compiler must emit code differently: each time a pointer field of a heap object is mutated it must check whether it's a link from old gen to young gen and remember that link (or just mark the page for scanning). So to have a generational GC in a mutable language you need to change the codegen as well. At least this is how most mature GCs work.

D code has different allocation patterns from Java and C#. In idiomatic D, young GC-allocated objects are probably much fewer.
Feb 11 2014
prev sibling next sibling parent "thedeemon" <dlang thedeemon.com> writes:
On Tuesday, 11 February 2014 at 14:32:10 UTC, Jakob Ovrum wrote:

 D code has different allocation patterns from Java and C#. In 
 idiomatic D, young GC-allocated objects are probably much fewer.

From Java - most probably. From C# - less so, because C# also has value types living on stack. In all of them little temporary strings and arrays are often main garbage generators, and most of them die young. Just think how many temporary arrays are allocated and become garbage while you grow a single array variable with "a ~= x;" in a loop. And before you say "Appender" think of associative arrays too. In case of D it's also problematic to decide which allocation pattern is idiomatic for it, since D can be used so differently in different applications.
Feb 11 2014
prev sibling next sibling parent "Adam Wilson" <flyboynw gmail.com> writes:
On Tue, 11 Feb 2014 08:33:59 -0800, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 2/11/14, 6:32 AM, Jakob Ovrum wrote:
 On Tuesday, 11 February 2014 at 10:29:58 UTC, thedeemon wrote:
 On Tuesday, 11 February 2014 at 04:36:28 UTC, Adam Wilson wrote:

 The GC itself is an orthogonal issue to the compiler. The way I see
 it, once the compiler can output precise information about the heap,
 stack, and registers, you can build any GC you want without the
 compiler requiring any knowledge of the GC.

If you want a fast GC it needs to be generational, i.e. most of the times scan just a portion of heap where young objects live (because most objects die young), not scan whole heap each time (as in current D GC). However in a mutable language that young/old generation split usually requires write barriers: compiler must emit code differently: each time a pointer field of a heap object is mutated it must check whether it's a link from old gen to young gen and remember that link (or just mark the page for scanning). So to have a generational GC in a mutable language you need to change the codegen as well. At least this is how most mature GCs work.

D code has different allocation patterns from Java and C#. In idiomatic D, young GC-allocated objects are probably much fewer.

I agree this is a good hypothesis (without having measured). My suspicion is a good GC for D is different from a good GC for Java or C#. Andrei

I'm not so sure about that. That might be true for Java but C# is a stack based language with value types. However, I think that, as with C#, we often forget about the temporaries we implicitly allocate. Strings, Arrays, closures (and lambdas in C#),the ~= operator, etc. These highly ephemeral semantics are also quite common in C#. I imagine that this makes D's allocation patterns much closer to C# than Java. I was thinking about this last night and as I continue reading the GC Handbook, I think I understand more about why MS did what they did with the .NET GC. First of all, they used every algorithm in that book in one way or another. For example, the Large Object Heap is a simple Mark-Sweep because there tend to be relatively few nodes to check and fragmentation is much lower than the ephemeral generations, however, they enabled opt-in compaction in the latest release because the large size of each node meant that fragmentation became a problem quicker in long-running processes. Also the more I dive into it, the more I think that thread-local GC is a bad idea. As I understand it the point is to reduce the overall pause on any one thread by reducing the scope of the heap to collect. However, I would argue that the common case is that a program has a few threads that dominate the majority of running time, with many ephemeral threads are created for quick work (an incoming message over a socket for example). In this case your main threads are still going to have large heaps for the dominate threads and most likely heaps that are never collected on the ephemeral threads. This means that a few threads will still have noticeable pause times, and we've significantly increased compiler complexity to support thread local GC on all threads, and probably hammered thread start up time to do it. I could go on, but my point is that at the end of the day if you want performant collections, you end up using every trick in the book. The mixture may be slightly different, but I would suggest that the mixture is going to be slightly different based on the type of app even using the same language, which is why .NET provides two modes for the collector, Server and Workstation, and Java has four. So saying that D's collector will be different is naturally obvious, but I don't think it will be significantly different than C# as implied. We still have roughly similar allocation patterns, with roughly similar use cases, and will most likely end up building in every algorithm available and then tuning it the mixture of those algorithms to meet D's needs. -- Adam Wilson GitHub/IRC: LightBender Aurora Project Coordinator
Feb 11 2014
prev sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Tuesday, 11 February 2014 at 04:19:12 UTC, Daniel Murphy wrote:
 "Xavier Bigand"  wrote in message 
 news:ldbpum$1pov$1 digitalmars.com...

 Firstly is there some progress on the DDMD project or maybe an 
 other D boostrap?

The old ddmd project is pretty much dead AFAIK (http://www.dsource.org/projects/ddmd) and was never up to date with the current compiler. For about a year I've been working on automatically converting the compiler source from C++ to D. The conversion has produced a working compiler on win32/linux32/linux64 (other platforms need trivial patches) that compiles druntime/phobos/the test suite without errors. The current effort is around cleaning up the C++ source to produce higher quality D code. The next major step is to actually switch development to the D version. Outstanding patches: https://github.com/D-Programming-Language/dmd/pull/1980 Conversion tool: https://github.com/yebblies/magicport2 You can see some of the recent patches (marked DDMD) here: https://github.com/yebblies?tab=contributions&period=monthly

By the way, what is the plan for all the outstanding pull requests that are still in C++?
Feb 11 2014
parent "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Brad Anderson"  wrote in message 
news:cbyfliriblkuxnioimwg forum.dlang.org...

 By the way, what is the plan for all the outstanding pull requests that 
 are still in C++?

In theory they can be converted automatically by applying, converting and diffing each commit one by one. Even if that doesn't work out, the codebases will be nearly identical immediately after conversion and converting the commits over manually won't be too difficult.
Feb 11 2014