digitalmars.D - C++ compiler vs D compiler

Abdulhaq (11/11) Oct 03 2015 Perhaps the answer to this is obvious, but what's harder to write

Cauterite (16/18) Oct 03 2015 I suspect writing a C++ compiler would be more difficult, unless
Timon Gehr (70/82) Oct 03 2015 I have also started a similar project four years ago (it's currently

Walter Bright (2/6) Oct 03 2015 Memory safety means no memory corruption is possible.

deadalnix (2/12) Oct 03 2015 Therefore, there can be no undefined behavior in @safe code.

Walter Bright (2/11) Oct 03 2015 Overflowing an int is undefined behavior, but it is not memory corruptio...

deadalnix (18/23) Oct 03 2015 Overflowing an int is defined behavior in D.

Abdulhaq <alynch4047 gmail.com> writes:

Perhaps the answer to this is obvious, but what's harder to write 
from scratch - a C++ compiler or a D compiler? :-)

We know Walter wrote a C++ compiler single handedly, does anyone 
else recall the C++ Grandmaster qualification, the free course 
where participants get to write a complete C++ compiler from 
scratch? I think it's dead now but I can't find any real info 
about that despite a serious google. What's the chances of anyone 
single-handedly writing a D compiler from scratch in the future? 
I know deadalnix is writing SDC in D - sounds interesting. Is the 
D language well enough documented / specified for a complete D 
implementation to be even possible (as things stand now)?

Oct 03 2015

Cauterite <cauterite gmail.com> writes:

On Saturday, 3 October 2015 at 10:45:29 UTC, Abdulhaq wrote:
 Perhaps the answer to this is obvious, but what's harder to 
 write from scratch - a C++ compiler or a D compiler? :-)

I suspect writing a C++ compiler would be more difficult, unless 
you takes some shortcuts.
The language's grammar is inherently hard to parse. Its semantics 
are overwhelmingly complex, with countless corner-cases to 
consider for every feature.
Though the first hurdle would be actually reading the standard. 
Last I checked, the C++14 standard was a 1300-page document.

D's situation is far better: context-free grammar, mostly 
straightforward semantics, and I believe it has fewer 
'disruptive' features (those which interact badly with other 
features). Implementing D would still be a huge undertaking, due 
to the sheer quantity of language features, but I imagine there 
would be *far* less tricky parts.

Disclaimer: I have no experience writing compiler backends (and 
limited experience writing frontends).

Oct 03 2015

Timon Gehr <timon.gehr gmx.ch> writes:

On 10/03/2015 12:45 PM, Abdulhaq wrote:
 Perhaps the answer to this is obvious, but what's harder to write from
 scratch - a C++ compiler or a D compiler? :-)

 We know Walter wrote a C++ compiler single handedly, does anyone else
 recall the C++ Grandmaster qualification, the free course where
 participants get to write a complete C++ compiler from scratch? I think
 it's dead now but I can't find any real info about that despite a
 serious google. What's the chances of anyone single-handedly writing a D
 compiler from scratch in the future? I know deadalnix is writing SDC in
 D - sounds interesting.

I have also started a similar project four years ago (it's currently 
just an incomplete compiler front-end), but I have not been able to work 
much on it during the last year. (It only compiles with DMD 2.060 
though, because of issues similar to what I discuss below.)

 Is the D language well enough documented /
 specified for a complete D implementation to be even possible (as things
 stand now)?

Well, not really. The main impediment to a fully formal specification is 
the interplay of forward references and the meta-programming system. 
(DMD just does a best-effort kind of thing, where the common case 
scenarios for which there were bug reports work, but in general the 
semantics of the resulting code can depend on things like the order that 
modules are passed on the command line, or perfectly valid code is 
rejected with a "forward reference error".) The documentation just 
specifies that everything works, but there is no consistent way to 
interpret it.

E.g.:

static if(!is(typeof(x))) enum y=2;
static if(!is(typeof(y))) enum x=2;

Arbitrarily abstruse examples can be constructed, e.g. this is from my 
test suite:

struct TestInvalidInheritance{
     class A{ int string; } // error
     template Mixin(string s){
         mixin("alias "~s~" Mixin;");
     }
     class D: Mixin!({D d = new E; return d.foo();}()){
         int foo(int x){ return 2;}
         string foo(){ return "X"; }
     }
     class E: D{
         override int foo(int x){ return super.foo(x); }
         override string foo(){ return "A"; }
     }
}

(I currently accept this code if the declaration of int string in class 
A is removed. Otherwise the code is analyzed until the point when it is 
clear that A is in D's superclass chain, and it is also clear that the 
symbol 'string' which was necessary to resolve in order to discover this 
fact was resolved incorrectly. The compiler then gives up and prints an 
error message:

example.d:3:18: error: declaration of 'string' is invalid
     class A{ int string; } // error
                  ^─────
example.d:9:9: note: this lookup on subclass 'D' should have resolved to it
         string foo(){ return "X"; })
         ^─────

I and SDC have different ways to deal with those kinds of examples, and 
I think the SDC way does not work (unless things have changed since I 
have looked at it). It assumes that declarations can be ordered and that 
it is fine to depend on the order of declarations.

My implementation is designed to be independent of declaration order and 
to reject the cases where there is no single obvious and consistent 
interpretation of the program. The drawbacks currently are:

- It is overly conservative in some cases, especially when string mixins 
are involved. E.g., the following code has only one consistent 
interpretation, but it is rejected (as is any reordering of those 
declarations):

enum x = "enum xx = q{int y = 0;};";

struct SS{
     mixin(xx);
     mixin(x);
}

- The current implementation is somewhat slow. IIRC, N-fold recursive 
template instantiation currently runs in Ω(N²). It's clear that this 
needs to be improved. If this is to be adopted as the official solution, 
it should not make the compiler any slower, at least in the common case.


There are also some other, more minor issues. For example, when the 
language specification speaks about "memory safety", it is really 
unclear what this means, as the language designers seem to think it that 
it is fine to have undefined behaviour in a section of code that is 
"verified memory safe".

Oct 03 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 10/3/2015 8:43 AM, Timon Gehr wrote:
 There are also some other, more minor issues. For example, when the language
 specification speaks about "memory safety", it is really unclear what this
 means, as the language designers seem to think it that it is fine to have
 undefined behaviour in a section of code that is "verified memory safe".

Memory safety means no memory corruption is possible.

Oct 03 2015

deadalnix <deadalnix gmail.com> writes:

On Saturday, 3 October 2015 at 19:43:13 UTC, Walter Bright wrote:
 On 10/3/2015 8:43 AM, Timon Gehr wrote:
 There are also some other, more minor issues. For example, 
 when the language
 specification speaks about "memory safety", it is really 
 unclear what this
 means, as the language designers seem to think it that it is 
 fine to have
 undefined behaviour in a section of code that is "verified 
 memory safe".

 Memory safety means no memory corruption is possible.

Therefore, there can be no undefined behavior in  safe code.

Oct 03 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 10/3/2015 12:49 PM, deadalnix wrote:
 On Saturday, 3 October 2015 at 19:43:13 UTC, Walter Bright wrote:
 On 10/3/2015 8:43 AM, Timon Gehr wrote:
 There are also some other, more minor issues. For example, when the language
 specification speaks about "memory safety", it is really unclear what this
 means, as the language designers seem to think it that it is fine to have
 undefined behaviour in a section of code that is "verified memory safe".

 Memory safety means no memory corruption is possible.

 Therefore, there can be no undefined behavior in  safe code.

Overflowing an int is undefined behavior, but it is not memory corruption.

Oct 03 2015

deadalnix <deadalnix gmail.com> writes:

On Sunday, 4 October 2015 at 01:26:53 UTC, Walter Bright wrote:
 Memory safety means no memory corruption is possible.

 Therefore, there can be no undefined behavior in  safe code.

 Overflowing an int is undefined behavior, but it is not memory 
 corruption.

Overflowing an int is defined behavior in D.

But let's say it is undefined behavior, is it guaranteed to be 
memory safe ? On a first glance it may seems, but I wouldn't bet 
on it. I'm sure one can find cases where to optimizer will 
generate something unsafe based on undefined behavior. Things 
like :

1/ Ho, let's remove that runtime check as I can prove it always 
pass.
2/ Ho, let's remove this codepath as I'm allowed to do so because 
undefined behavior.
3/ Well fuck, invariant for 1/ is now broken because of the 
missing codepath and the code is not memory safe.

You'd be surprised how much tens of perfectly reasonable 
transformations run one after another can do to your code when 
you throw undefined behavior in.

TL;DR: Even if it doesn't touch memory, it can lead to memory 
unsafe things.

Oct 03 2015

D Programming

C/C++ Programming

Other

digitalmars.D - C++ compiler vs D compiler