www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D frontend

reply Timon Gehr <timon.gehr gmx.ch> writes:
As there has been some interest at DConf, I have pushed the experimental 
D frontend I have been hacking on to github under the boost licence (for 
compatibility with DMD).

https://github.com/tgehr/d-compiler

It is not yet feature-complete (e.g. constructors, destructors, 
exception handling,... is missing), but it handles e.g. large portions 
of templates and CTFE. (CTFE uses a byte code interpreter.)

One major goal of this project is to figure out a good way to handle 
non-trivial compile-time dependency structures (i.e. solving the 
"forward reference error" problem). As it does explicit dependency 
tracking on AST nodes, it can possibly be used for fine-grained 
(AST-level) incremental compilation in the future.

Unfortunately the only version of DMD that builds the project is 2.060 
(due to forward reference errors with newer versions).

It is not exactly well documented at the moment, so feel free to contact 
me if something does not make sense. (There are some really ugly hacks 
working around the lack of UDAs in 2.060, for example.)
May 06 2016
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/6/16 11:34 AM, Timon Gehr wrote:
 As there has been some interest at DConf, I have pushed the experimental
 D frontend I have been hacking on to github under the boost licence (for
 compatibility with DMD).

 https://github.com/tgehr/d-compiler

 It is not yet feature-complete (e.g. constructors, destructors,
 exception handling,... is missing), but it handles e.g. large portions
 of templates and CTFE. (CTFE uses a byte code interpreter.)

 One major goal of this project is to figure out a good way to handle
 non-trivial compile-time dependency structures (i.e. solving the
 "forward reference error" problem). As it does explicit dependency
 tracking on AST nodes, it can possibly be used for fine-grained
 (AST-level) incremental compilation in the future.

 Unfortunately the only version of DMD that builds the project is 2.060
 (due to forward reference errors with newer versions).

 It is not exactly well documented at the moment, so feel free to contact
 me if something does not make sense. (There are some really ugly hacks
 working around the lack of UDAs in 2.060, for example.)
Great news! I think an excellent first step would be to get you or another contributor to reduce the forward reference matter to a small size. That would help a fix. -- Andrei
May 06 2016
prev sibling next sibling parent Elie Morisse <syniurge gmail.com> writes:
On Friday, 6 May 2016 at 09:34:39 UTC, Timon Gehr wrote:
 One major goal of this project is to figure out a good way to 
 handle non-trivial compile-time dependency structures (i.e. 
 solving the "forward reference error" problem). As it does 
 explicit dependency tracking on AST nodes, it can possibly be 
 used for fine-grained (AST-level) incremental compilation in 
 the future.
Great initiative! While mapping big C++ librairies with a lot of templates with Calypso I still hit forward referencing errors from time to time. Some of them were easily fixed but I feel that making the whole thing more robust is a complex task that can't be achieved through small fixes/hacks, it needs a drastic solution like yours.
May 06 2016
prev sibling parent reply Elie Morisse <syniurge gmail.com> writes:
Timon, any update on this? What are the insights you gained with 
your frontend?

I recently reported two cases without a simple fix:

https://issues.dlang.org/show_bug.cgi?id=17656
https://issues.dlang.org/show_bug.cgi?id=17194#c1

and have seen a lot more referencing errors with Calypso, 
especially when this gets enabled: 
https://github.com/Syniurge/Calypso/commit/1e1ae319e32120bd9ef0009716ddabed92f69ac2

Calypso makes its mapped C++ symbols go through the same 
importAll -> semantic1,2,3 route that D symbols take. Ultimately 
this is mostly useless work that should be skipped, the reason it 
currently works this way being that I wasn't familiar yet with 
the DMD source code when I started. But what this hard and 
ungrateful work has also been doing (and many large libraries are 
blocked by this) is exposing a seemingly infinite number of bogus 
forward/circular/misc referencing DMD errors.
Those errors boil down to semantic calls getting triggered at the 
wrong time, on symbols that the caller doesn't really depend upon.

Because most of the time, the semantic() call on the LHS of 
DotXXXExp, inside AggregateDeclaration.determineSize, etc. is 
there in case there are:
  - mixins to expand
  - attributes whose members have to be added to the parent symtab
  - if LHS is a template to instantiate

These are (AFAIK) the only cases where the symtab of the LHS or 
the aggregate may get altered, and if I understand correctly 
that's what the semantic call is checking before searching for 
the RHS or determining the aggregate fields and then its size.

So would splitting semantic() into determineMembers() preceding 
the rest of semantic() be worth exploring? The thing is, this 
would help in most cases but I can imagine scenarios where simply 
splitting may not be enough. Example:

enum E { OOO = S.UUU }

import std.conv;
string genMember1() { return "enum G8H9 = " ~ 
(cast(int)E.OOO).to!string; }
string genMember2() { return "enum UUU = 1;"; }

struct S {
     mixin(genMember1());
     mixin(genMember2());
}

We'll have S.determineMembers -> E.OOO.semantic -> 
S.determineMembers, and although in this case the value of OOO 
may be interpreted to 1, at this point the compiler can't easily 
know whether mixins will generate zero, one or more UUU members 
or not. To attenuate the problem determineMembers() could be made 
be callable multiple times (recursively), each time starting from 
where the previous (on-going) call left off, so in this 
particular case the second S.determineMembers call would expand 
the second mixin to enum UUU = 1. But then how does the compiler 
knows and react if genMember1 generate a new UUU member? Ok a 
second UUU enum will error, but what if UUU was a function and 
genMember1() generates a more specialized overload of UUU? I.e:

enum E { OOO = S.UUU(1) }

import std.conv;
string genMember1() { return "static int UUU(int n) { return n; 
}; enum G8H9 = " ~ (cast(int)E.OOO).to!string; }
string genMember2() { return "static int UUU(int n, int a = 5) { 
return n + 5; }"; }

struct S {
     mixin(genMember1());
     mixin(genMember2());
}

At this point well it's getting a bit contrived, so maybe it's 
not really worth finding a solution to make this compile (but 
ideally the compiler should still warn the user).

Should I try splitting semantic() and make a PR? It might be a 
lot of work, so I'd like to know if this makes sense first.
Jul 16
next sibling parent Elie Morisse <syniurge gmail.com> writes:
I'll start working on this tomorrow, so if you believe this 
effort will be in vain, please leave a comment.
Jul 20
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 17.07.2017 03:16, Elie Morisse wrote:
 Timon, any update on this?
Not really. I did not have much time to work in this in the past year. (Also, it is not very convenient to work with DMD 2.060.)
 What are the insights you gained with your 
 frontend?
 ...
What I have implemented works well but still has a few limitations, e.g. the following code does not compile: enum x = "enum xx = q{int y = 0;};"; struct SS{ mixin(xx); mixin(x); // error } I believe this should be allowed. An easy but slightly ugly solution would be to sequentialize analysis of string mixins even in scopes that have no ordering. (Then the above would still not compile, but it would compile once we change the order of the two string mixins.) The idea behind the algorithm used by my frontend is to analyze all AST nodes concurrently, blocking them as soon as they need information that is not yet available. This continues until the analysis has finished or all AST nodes are blocked. In that case, the compiler examines the top strongly connected components of the dependency graph (i.e. the smallest set of nodes that everything else depends on). Some kinds of analysis dependencies can be discharged by assuming a natural default result. (For example, we can assume by default that a certain member is not present, or that a more specialized overload does not exist, etc.) The assumptions are made simultaneously for all nodes in the top strongly connected components and analysis proceeds. The compiler prints an error if any assumptions end up being violated.
 I recently reported two cases without a simple fix:
 
 https://issues.dlang.org/show_bug.cgi?id=17656
 https://issues.dlang.org/show_bug.cgi?id=17194#c1
 ...
Those are easily handled by my algorithm.
 and have seen a lot more referencing errors with Calypso, especially 
 when this gets enabled: 
 https://github.com/Syniurge/Calypso/commit/1e1ae319e32120bd9ef0
09716ddabed92f69ac2 
 
 
 Calypso makes its mapped C++ symbols go through the same importAll -> 
 semantic1,2,3 route that D symbols take. Ultimately this is mostly 
 useless work that should be skipped, the reason it currently works this 
 way being that I wasn't familiar yet with the DMD source code when I 
 started. But what this hard and ungrateful work has also been doing (and 
 many large libraries are blocked by this) is exposing a seemingly 
 infinite number of bogus forward/circular/misc referencing DMD errors.
 Those errors boil down to semantic calls getting triggered at the wrong 
 time, on symbols that the caller doesn't really depend upon.
 ...
I don't think that in the end there will be a better solution than explicitly tracking what depends on what.
 Because most of the time, the semantic() call on the LHS of DotXXXExp, 
 inside AggregateDeclaration.determineSize, etc. is there in case there are:
   - mixins to expand
Also, conditional compilation like static if, I guess.
   - attributes whose members have to be added to the parent symtab
   - if LHS is a template to instantiate
 
 These are (AFAIK) the only cases where the symtab of the LHS or the 
 aggregate may get altered, and if I understand correctly that's what the 
 semantic call is checking before searching for the RHS or determining 
 the aggregate fields and then its size.
 ...
I handle more things, for example inheritance, or adding members to aggregates whose instances have already been used in CTFE etc., but those are probably not a big deal for your use case.
 So would splitting semantic() into determineMembers() preceding the rest 
 of semantic() be worth exploring?
Probably yes, but it will at most be a partial solution.
 The thing is, this would help in most 
 cases but I can imagine scenarios where simply splitting may not be 
 enough. Example:
 
 enum E { OOO = S.UUU }
 
 import std.conv;
 string genMember1() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string; }
 string genMember2() { return "enum UUU = 1;"; }
 
 struct S {
      mixin(genMember1());
      mixin(genMember2());
 }
genMember1() is missing a semicolon in the code it generates, but otherwise this works with what I have.
 
 We'll have S.determineMembers -> E.OOO.semantic -> S.determineMembers, 
 and although in this case the value of OOO may be interpreted to 1, at 
 this point the compiler can't easily know whether mixins will generate 
 zero, one or more UUU members or not. To attenuate the problem 
 determineMembers() could be made be callable multiple times 
 (recursively), each time starting from where the previous (on-going) 
 call left off,
How do you make sure you don't go on indefinitely and overflow the stack?
 so in this particular case the second S.determineMembers 
 call would expand the second mixin to enum UUU = 1. But then how does 
 the compiler knows and react if genMember1 generate a new UUU member? Ok 
 a second UUU enum will error, but what if UUU was a function and 
 genMember1() generates a more specialized overload of UUU? I.e:
 
 enum E { OOO = S.UUU(1) }
 
 import std.conv;
 string genMember1() { return "static int UUU(int n) { return n; }; enum 
 G8H9 = " ~ (cast(int)E.OOO).to!string; }
 string genMember2() { return "static int UUU(int n, int a = 5) { return 
 n + 5; }"; }
 
 struct S {
      mixin(genMember1());
      mixin(genMember2());
 }
  > At this point well it's getting a bit contrived, so maybe it's not
 really worth finding a solution to make this compile
My opinion is that the following is good enough: enum E { OOO = S.UUU(1) } string genMember1() { return "static int UUU(int n) { return n; };"; } string genMember2() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string~";"; } string genMember3() { return "static int UUU(int n, int a = 5) { return n + 5; }"; } struct S { mixin(genMember1()); mixin(genMember2()); mixin(genMember3()); } It is possible to make many cases like yours (that have spurious dependencies through string values) work too, but I don't think it is remotely worth the effort. One would need to e.g. symbolically compute the mixed in string and partially evaluate the parser to extract some declarations before we know the entire mixed in string.
 (but ideally the 
 compiler should still warn the user).
 ...
My frontend says: <mixin test.d:7>:1:1: error: introduction of new overload is invalid static int UUU(int n) { return n; }; enum G8H9 = 6; ^────────────────────────────────── test.d:1:16: note: overload set was sealed here enum E { OOO = S.UUU(1) } ^──── In general, the frontend is designed such that compiling the original code with all mixins and conditional compilation expanded always gives the same results as compiling the original code. (And if it is not able to achieve this, the frontend prints an error.)
 Should I try splitting semantic() and make a PR? It might be a lot of 
 work, so I'd like to know if this makes sense first.
It's hard to tell for me, as I am not extremely familiar with DMD's internals. If this solves your issues with Calypso, I'd say, go for it. (Also, who knows, you might fix enough forward reference issues to make my frontend compile with the latest DMD again.)
Jul 21