www.digitalmars.com         C & C++   DMDScript  

D.gnu - Study of GCC frontend and Walter's DMD compiler sources

reply andy <acoliver apache.org> writes:
Hi,

I've been studying GCC and Walter's sources.  The GCC sources are very 
much of the C world.  (specifically of the "my entire program is a 
series of #DEFINEs" persuasion).  The dmd sources are of course in C++, 
but make judicious use of C++ features.

I think it would be increadibly difficult to write a GCC front end in 
C++.  Therefore I think we'll have to port the D front end to straight C 
first, while attemping to parallel Walter's overall object structure. 
Its unfortunate because it would be nice to have a non-divergent 
versions, but I don't think there is an easier way to do it.  Any 
thoughts?  (practical thoughts only please)

The GCC front-end sources are spartanly documented; however, I posted a 
series of links, one of which is the documentation the GCC COBOL front 
end folks produced.

I've put a couple tasks on the sourceforge task manager.  I think our 
steps are (though not necessarily in this order):

1. Port DMD front end sources to C - explained above.

2. Create an overall framework/coupling for working with the GCC front 
end.  This serves two purposes.  First of all, to isolate us from the 
changing GCC front-end interface.  Second of all, the GCC front end 
stuff is ugly, we should be able to encapsulate it and produce something 
that allows us to manage it at a higher level so that we only have to 
get intimate with it once.  The rest of the time we'll work above it.

3. attach the front end to the coupling.

As I've said, I'm pretty rusty with both C and C++, so I'm hoping some 
other folks will jump in early on.  (Which is unfortunately just the 
opposite of how things usually go ;-) ).

Over the next while (unless someone shows me a practical way to attach 
the two), I'll start committing a port of the D front end to C to the 
project CVS at sourceforge.net/projects/brightd

Thoughts, comments?

-Andy
Jun 02 2002
parent reply Andy Walker <Andy_member pathlink.com> writes:
In article <3CFA4813.6030306 apache.org>, andy says...
Hi,

I've been studying GCC and Walter's sources.  The GCC sources are very 
much of the C world.  (specifically of the "my entire program is a 
series of #DEFINEs" persuasion).  The dmd sources are of course in C++, 
but make judicious use of C++ features.
Back from Site. Finally. Lots of very long days. I did have a day off and a few evenings, so I have been working on Dgnu. DMD sources in C++: I have spent a couple of days trying to find ways to deal with this. My heart sank when I started looking at the DMD code and finally realized that this was all C++, not C. (I actually started coding C++ before I used C, so obvious incompatibilities do not jump out at me until I compile the thing and thousands of error messages appear.) One option is a sort of awk scriptie C++ to C compiler. It does not have to be a full-function compiler, because Walter does not use every possible C++ syntax in his code. Still, it will have to properly handle all the non-C characteristics. I have started this, but it is coming along very slowly. On the nice side, it will only have to work on the released portion of DMD, which is, IIRC, about 27,000 lines of code. Miniscule, really, considering what it does. Another option is to find a C++ to C compiler and use it. I have done some searching on the web, but I have not yet found anything. (It is out there, because I recall having seen it, I just do not recall where.) The real concern I have about this is that it will probably turn Walter's elegant code into garbage. This offends me. Another option is to just grit my teeth and rewrite it all by hand into C, then hook it up to the GCC system. That way, I can maintain much of Walter's elegance, and still get something working in a reasonable time. Comments? Andy Walker
I think it would be increadibly difficult to write a GCC front end in 
C++.  Therefore I think we'll have to port the D front end to straight C 
first, while attemping to parallel Walter's overall object structure. 
Its unfortunate because it would be nice to have a non-divergent 
versions, but I don't think there is an easier way to do it.  Any 
thoughts?  (practical thoughts only please)

The GCC front-end sources are spartanly documented; however, I posted a 
series of links, one of which is the documentation the GCC COBOL front 
end folks produced.

I've put a couple tasks on the sourceforge task manager.  I think our 
steps are (though not necessarily in this order):

1. Port DMD front end sources to C - explained above.

2. Create an overall framework/coupling for working with the GCC front 
end.  This serves two purposes.  First of all, to isolate us from the 
changing GCC front-end interface.  Second of all, the GCC front end 
stuff is ugly, we should be able to encapsulate it and produce something 
that allows us to manage it at a higher level so that we only have to 
get intimate with it once.  The rest of the time we'll work above it.

3. attach the front end to the coupling.

As I've said, I'm pretty rusty with both C and C++, so I'm hoping some 
other folks will jump in early on.  (Which is unfortunately just the 
opposite of how things usually go ;-) ).

Over the next while (unless someone shows me a practical way to attach 
the two), I'll start committing a port of the D front end to C to the 
project CVS at sourceforge.net/projects/brightd

Thoughts, comments?

-Andy
Andy Walker
Jun 09 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Andy Walker" <Andy_member pathlink.com> wrote in message
news:ae1gb7$1oih$1 digitaldaemon.com...
 In article <3CFA4813.6030306 apache.org>, andy says...
Hi,

I've been studying GCC and Walter's sources.  The GCC sources are very
much of the C world.  (specifically of the "my entire program is a
series of #DEFINEs" persuasion).  The dmd sources are of course in C++,
but make judicious use of C++ features.
Back from Site. Finally. Lots of very long days. I did have a day off
and a
 few evenings, so I have been working on Dgnu.

 DMD sources in C++:  I have spent a couple of days trying to find ways to
 deal with this.  My heart sank when I started looking at the DMD code and
 finally realized that this was all C++, not C.  (I actually started coding
C++
 before I used C, so obvious incompatibilities do not jump out at me until
 I compile the thing and thousands of error messages appear.)

 One option is a sort of awk scriptie C++ to C compiler.  It does not have
to be
 a full-function compiler, because Walter does not use every possible C++
 syntax in his code.  Still, it will have to properly handle all the non-C
 characteristics.  I have started this,  but it is coming along very
slowly.
 On the nice side, it will only have to work on the released portion of
DMD,
 which is, IIRC, about 27,000 lines of code.  Miniscule, really,
considering
 what it does.

 Another option is to find a C++ to C compiler and use it.  I have done
some
 searching on the web, but I have not yet found anything.  (It is out
there,
 because I recall having seen it, I just do not recall where.)  The real
 concern I have about this is that it will probably turn Walter's elegant
code
 into garbage.  This offends me.

 Another option is to just grit my teeth and rewrite it all by hand into C,
then
 hook it up to the GCC system.  That way, I can maintain much of Walter's
 elegance, and still get something working in a reasonable time.

 Comments?

 Andy Walker
D is all written in C++. But it doesn't use much of C++ features - the only grief will likely be the heavy use of virtual functions. Most member functions can be rewritten from: f->foo(x); to: Foo_foo(f, x); Single inheritance: class A { ... }; class B : A { ... }; can be: typedef struct A { ... } A; typedef struct B { A a; ... } B; Virtual functions can be handled by having a: void **vptr; as the first member of each struct, and then initialize it in the constructor to the vtbl[] for that struct. Calling it would require a macro for each function: class A { virtual int foo(args...); }; A *a; a->foo(args); would be: typedef int (*A_foo_fp)(args...); #define A_foo(this) (*(A_foo_fp)(this->_vptr[0])) A_foo(a)(args...); Constructors: class A { A(args...); }; would be: A *A_ctor(A *this, args...); Following these conventions will make the C code have a good correspondence with the C code.
Jun 10 2002
parent Andy Walker <Andy_member pathlink.com> writes:
In article <ae1obm$22ci$1 digitaldaemon.com>, Walter says...
 One option is a sort of awk scriptie C++ to C compiler.  It does not have
to be
 a full-function compiler, because Walter does not use every possible C++
 syntax in his code.  
D is all written in C++. But it doesn't use much of C++ features - the only grief will likely be the heavy use of virtual functions.
I had that feeling when I looked at it. Didn't do any statistics, though. Most member
functions can be rewritten from:
    f->foo(x);
to:
    Foo_foo(f, x);

Single inheritance:
    class A { ... };
    class B : A { ... };
can be:
    typedef struct A { ... } A;
    typedef struct B { A a; ... } B;

Virtual functions can be handled by having a:
    void **vptr;
as the first member of each struct, and then initialize it in the
constructor to the vtbl[] for that struct. Calling it would require a macro
for each function:
    class A
    {    virtual int foo(args...);
    };
   A *a;
   a->foo(args);
would be:
    typedef int (*A_foo_fp)(args...);
    #define A_foo(this)    (*(A_foo_fp)(this->_vptr[0]))
    A_foo(a)(args...);

Constructors:
    class A
    {    A(args...);
    };
would be:
    A *A_ctor(A *this, args...);

Following these conventions will make the C code have a good correspondence
with the C code.
That was not really how I was thinking about approaching it. I tend to be too brute force. Yours is so much better that I will do it your way. This will be relatively easy to implement with the awk script. Andy Walker
Jun 11 2002
prev sibling next sibling parent reply andy <acoliver apache.org> writes:
 
 Back from Site.  Finally.  Lots of very long days.  I did have a day off and a
 few evenings, so I have been working on Dgnu.
 
 DMD sources in C++:  I have spent a couple of days trying to find ways to 
 deal with this.  My heart sank when I started looking at the DMD code and 
 finally realized that this was all C++, not C.  (I actually started coding C++
 before I used C, so obvious incompatibilities do not jump out at me until 
 I compile the thing and thousands of error messages appear.)
 
Yeah, at first glance I saw structs, some c and h files, no cpps and I thought "great its in C"... Then I took a more in depth glance and gasped.
 One option is a sort of awk scriptie C++ to C compiler.  It does not have to
be 
 a full-function compiler, because Walter does not use every possible C++
 syntax in his code.  Still, it will have to properly handle all the non-C 
 characteristics.  I have started this,  but it is coming along very slowly.  
 On the nice side, it will only have to work on the released portion of DMD, 
 which is, IIRC, about 27,000 lines of code.  Miniscule, really, considering
 what it does.  
 
I kinda like this idea because we can sync up more easily later.
 Another option is to find a C++ to C compiler and use it.  I have done some
 searching on the web, but I have not yet found anything.  (It is out there, 
 because I recall having seen it, I just do not recall where.)  The real 
 concern I have about this is that it will probably turn Walter's elegant code 
 into garbage.  This offends me.
 
yes, and we'd have to hook code into it, which would require us to read it... yuck.
 Another option is to just grit my teeth and rewrite it all by hand into C,
then 
 hook it up to the GCC system.  That way, I can maintain much of Walter's 
 elegance, and still get something working in a reasonable time.  
 
If you have a good handle on how to do the code generator, then I favor that idea. I don't really know awk but I'll do my best to catch up. And its probably about time I learned. It might help, though walter has very few hooks to the backend, if walter and we could agree on interfaces to the back end so that this can be easily plugged in and out in parallel. Meaning we could get to where we just run the C-ize program on walters code and do our own work on our "backend" (mid layer between D frontend and GNU). This would also potentially allow others to do the same perhaps for other compilers. Granted this is a utopian view, and I'm sure we'd spend time modifying the thing for subsequent releases, but I think the pain could be minimized into a small non-bleeding ulcer. Writing our own divergent C version and maintaining it would be more painful I think. If we can get to where the generated C version just hooks into the backend (or rather our abstraction of it), that would be sweet. The only way it would sound like a real good idea to write it by hand is if Walter were willing to switch from C++ to C. (that would produce the least gap and achieve higher portability, but he may have reasons for not wishing to do such). -andy
 Comments?
 
 Andy Walker
 
 
I think it would be increadibly difficult to write a GCC front end in 
C++.  Therefore I think we'll have to port the D front end to straight C 
first, while attemping to parallel Walter's overall object structure. 
Its unfortunate because it would be nice to have a non-divergent 
versions, but I don't think there is an easier way to do it.  Any 
thoughts?  (practical thoughts only please)

The GCC front-end sources are spartanly documented; however, I posted a 
series of links, one of which is the documentation the GCC COBOL front 
end folks produced.

I've put a couple tasks on the sourceforge task manager.  I think our 
steps are (though not necessarily in this order):

1. Port DMD front end sources to C - explained above.

2. Create an overall framework/coupling for working with the GCC front 
end.  This serves two purposes.  First of all, to isolate us from the 
changing GCC front-end interface.  Second of all, the GCC front end 
stuff is ugly, we should be able to encapsulate it and produce something 
that allows us to manage it at a higher level so that we only have to 
get intimate with it once.  The rest of the time we'll work above it.

3. attach the front end to the coupling.

As I've said, I'm pretty rusty with both C and C++, so I'm hoping some 
other folks will jump in early on.  (Which is unfortunately just the 
opposite of how things usually go ;-) ).

Over the next while (unless someone shows me a practical way to attach 
the two), I'll start committing a port of the D front end to C to the 
project CVS at sourceforge.net/projects/brightd

Thoughts, comments?

-Andy
Andy Walker
Jun 10 2002
next sibling parent reply "Martin M. Pedersen" <mmp www.moeller-pedersen.dk> writes:
"andy" <acoliver apache.org> wrote in message
news:3D04D8C5.50909 apache.org...

 Another option is to just grit my teeth and rewrite it all by hand into
C, then
 hook it up to the GCC system.  That way, I can maintain much of Walter's
 elegance, and still get something working in a reasonable time.
This was my first idea, because I would like a bison/C solution. However, I came to the conclusion that D cannot be specified as a LALR(1) grammar as required by bison. The problem I ran into was the deeply recursive Parser::is... methods. Other LALR(N) parser generators exists, but I have no experience with them.
 One option is a sort of awk scriptie C++ to C compiler.  It does not
have to be
 a full-function compiler, because Walter does not use every possible C++
 syntax in his code.  Still, it will have to properly handle all the
non-C
 characteristics.  I have started this,  but it is coming along very
slowly.
 On the nice side, it will only have to work on the released portion of
DMD,
 which is, IIRC, about 27,000 lines of code.  Miniscule, really,
considering
 what it does.
Before trying to make it C code, we need it to compile as the C++ sources they are. That is not trivial either. Tonight I have had luck making "parse.c" and "lexer.c" compile MSVS6 using only automated changes using an AWK script (attached). One of the problems is that the source is incomplete. Until now I have been missing: Id StringTable StringValue lstring I expect I will run into more later. Only Walter knows why they are not part of the archive. Maybe it is just a mishap, or maybe he has his reasons not to publish them. But it is, BTW, also part of the reason I started out reimplementing the thing from scratch.
 I kinda like this idea because we can sync up more easily later.
That would be great. I guess that the parser is still open for major changes, so a mechanism that will allow us to easily keep the implementation in sync will be of great value. I don't know how difficult it would be, or if there are reasons for Walter not to do so, but it would be very helpful if Walter provided us with a package that was complete enough to compile and link - at least with DMC. Regards, Martin M. Pedersen begin 666 fixsrc.awk M(R M+2T 4F5W;W)K(&AA;F1L:6YG(&]F(&EN8VQU9&4 9FEL97,-"D)%1TE. M/2(C:6YC;'5D92(I('Q\(" D,3T](B-P<F%G;6$B("8F("0R/3TB;VYC92(I M;2!35$, *'!A<G-E+F,I+"!S;R!W92!R961E9FEN92!I="X-"B D,3T](F5N M(D9,04=3(B F)B!.1CT],BD >PT*"7!R:6YT(")T>7!E9&5F(&EN="!&3$%' M="!A;&QO=R!C87-T:6YG(&UE;6-H<B I('1O('5N<VEG;F5D(&-H87(J("AL M97AE<BYC*0T*>PT*"6=S=6(H+VUE;6-H<B\L(BAU;G-I9VYE9"!C:&%R*BEM M;G1R86-T(#T ` end begin 666 common.h M<G)N;RYH/ T*(VEN8VQU9&4 /'-T9&%R9RYH/ T*(VEN8VQU9&4 /'-T9&1E M9BYH/ T*(VEN8VQU9&4 /'-T9&QI8BYH/ T*(VEN8VQU9&4 /'-T9&EO+F ^ M(%-T871E;65N=#L-" T*+R\ 36ES<VEN9R!C;&%S<PT*<W1R=6-T($Q3=')I M8VQA<W,-"G-T<G5C="!3=')I;F=686QU90T*>PT*(" ('9O:60J(" (" M4W1R:6YG5F%L=64J('5P9&%T92AC;VYS="!C:&%R*B!S='(L('-I>F5?="!L M96XI.PT*(" (%-T<FEN9U9A;'5E*B!I;G-E<G0H8V]N<W0 8VAA<BH <W1R M($ED('L-"G!U8FQI8SH-"B ("!S=&%T:6, 261E;G1I9FEE<BH 5VEN9&]W M<SL-"B ("!S=&%T:6, 261E;G1I9FEE<BH 4&%S8V%L.PT*(" ('-T871I M:6YE(&9L;V%T('-T<G1O9BAC;VYS="!C:&%R*B!S='(L(&-H87(J*B!E;F0I M('L <F5T=7)N("AF;&]A="ES=')T;V0H<W1R+&5N9"D[('T-"FEN;&EN92!L M;VYG(&1O=6)L92!S=')T;VQD*&-O;G-T(&-H87(J('-T<BP 8VAA<BHJ(&5N M:6YC;'5D92 B;6%R<RYH( T*(VEN8VQU9&4 (G)O;W0N:"(-"B-I;F-L=61E M(")S8V]P92YH( T*(VEN8VQU9&4 (F1S>6UB;VPN:"(-"B-I;F-L=61E(")E M( T*(VEN8VQU9&4 (FUO9'5L92YH( T*(VEN8VQU9&4 (F%G9W)E9V%T92YH M( T*(VEN8VQU9&4 (FEM<&]R="YH( T*(VEN8VQU9&4 (F%T=')I8BYH( T* M<BYH( T*(VEN8VQU9&4 (G9E<G-I;VXN:"(-"B-I;F-L=61E(")D96)C;VYD M:6YC;'5D92 B97AP<F5S<VEO;BYH( T*(VEN8VQU9&4 (G-T871E;65N="YH !" `` ` end
Jun 10 2002
next sibling parent "Walter" <walter digitalmars.com> writes:
"Martin M. Pedersen" <mmp www.moeller-pedersen.dk> wrote in message
news:ae3a7m$qt0$1 digitaldaemon.com...
 Until now I have been missing:

     Id
     StringTable
     StringValue
     lstring
These are pretty trivial, they are simple associative arrays. I had thought that the equivalent gcc functionality would be better to use for a gcc port. Perhaps I'm wrong.
Jun 10 2002
prev sibling parent Andy Walker <Andy_member pathlink.com> writes:
In article <ae3a7m$qt0$1 digitaldaemon.com>, Martin M. Pedersen says...
"andy" <acoliver apache.org> wrote in message
news:3D04D8C5.50909 apache.org...

 Another option is to just grit my teeth and rewrite it all by hand into
C, then
 hook it up to the GCC system.  That way, I can maintain much of Walter's
 elegance, and still get something working in a reasonable time.
This was my first idea, because I would like a bison/C solution. However, I came to the conclusion that D cannot be specified as a LALR(1) grammar as required by bison. The problem I ran into was the deeply recursive Parser::is... methods. Other LALR(N) parser generators exists, but I have no experience with them.
 One option is a sort of awk scriptie C++ to C compiler.  It does not
have to be
 a full-function compiler, because Walter does not use every possible C++
 syntax in his code.  Still, it will have to properly handle all the
non-C
 characteristics.  I have started this,  but it is coming along very
slowly.
 On the nice side, it will only have to work on the released portion of
DMD,
 which is, IIRC, about 27,000 lines of code.  Miniscule, really,
considering
 what it does.
Before trying to make it C code, we need it to compile as the C++ sources they are. That is not trivial either. Tonight I have had luck making "parse.c" and "lexer.c" compile MSVS6 using only automated changes using an AWK script (attached). One of the problems is that the source is incomplete. Until now I have been missing: Id StringTable StringValue lstring I expect I will run into more later. Only Walter knows why they are not part of the archive. Maybe it is just a mishap, or maybe he has his reasons not to publish them. But it is, BTW, also part of the reason I started out reimplementing the thing from scratch.
 I kinda like this idea because we can sync up more easily later.
That would be great. I guess that the parser is still open for major changes, so a mechanism that will allow us to easily keep the implementation in sync will be of great value. I don't know how difficult it would be, or if there are reasons for Walter not to do so, but it would be very helpful if Walter provided us with a package that was complete enough to compile and link - at least with DMC. Regards, Martin M. Pedersen
<snip> Yeeahooo! More Toys! Andy Walker
Jun 11 2002
prev sibling parent reply Andy Walker <Andy_member pathlink.com> writes:
In article <3D04D8C5.50909 apache.org>, andy says...

<snip> 
 
If you have a good handle on how to do the code generator, then I favor that idea. I don't really know awk but I'll do my best to catch up. And its probably about time I learned.
I have A handle. Whether it is good or not remains to be seen. Walter's posts earlier today will be a big help. Awk is ort of fun, really. I will be happy to give pointers and advice, if you are interested..
It might help, though walter has very few hooks to the backend, if 
walter and we could agree on interfaces to the back end so that this can 
be easily plugged in and out in parallel.  Meaning we could get to where 
we just run the C-ize program on walters code and do our own work on our 
"backend" (mid layer between D frontend and GNU).  This would also 
potentially allow others to do the same perhaps for other compilers. 
Granted this is a utopian view, and I'm sure we'd spend time modifying 
the thing for subsequent releases, but I think the pain could be 
minimized into a small non-bleeding ulcer.
I am happy to share the awk scripts. Once I get them working better, I will post them as "freely given to the public". I am really not too concerned about adapting to release changes. There will be some work, but not bad, and no matter where you go, there will be work related to release changes.
Writing our own divergent C version and maintaining it would be more 
painful I think.  If we can get to where the generated C version just 
hooks into the backend (or rather our abstraction of it), that would be 
sweet.  The only way it would sound like a real good idea to write it by 
hand is if Walter were willing to switch from C++ to C.  (that would 
produce the least gap and achieve higher portability, but he may have 
reasons for not wishing to do such).
It is just selfishness, but I do not want Walter to ever think about how he would change his coding practices to conform to my needs. I love the help, but that Bright D front-end is just really pretty to me. It does what it is supposed to do. I would hate for Walter to muck it up, trying to adapt to some arbitrary format. And for the record, GNU CC is one very arbitrary format.
-andy
<snip> Andy Walker
Jun 11 2002
parent andy <acoliver apache.org> writes:
</snip>

 I have A handle.  Whether it is good or not remains to be seen.  Walter's 
 posts earlier today will be a big help.
 
 Awk is ort of fun, really. I will be happy to give pointers and advice, if you
 are interested..  
 
cool. Yes, sounds good. <snip/>
 
 I am happy to share the awk scripts.  Once I get them working better, I will
 post them as "freely given to the public".  I am really not too concerned 
 about adapting to release changes.  There will be some work, but not bad, 
 and no matter where you go, there will be work related to release changes.
 
Cool. Don't worry about getting them finished, just conceptually ready. I'll work on it.
 
 It is just selfishness, but I do not want Walter to ever think about how he 
 would change his coding practices to conform to my needs.  I love the help, 
 but that Bright D front-end is just really pretty to me.  It does what it is
 supposed
 to do.  I would hate for Walter to muck it up, trying to adapt to some
arbitrary
 format.  And for the record, GNU CC is one very arbitrary format.
Think I'm a GNU appologist or something? Not really. I don't like the premise of the GPL. -Andy
 
-andy
<snip> Andy Walker
Jun 11 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
Greg Comeau sells a C++ -> C translator (Comeau C++) that may be useful. I
don't think the C output will be edittable, but since C++ can interface to
C, it should be possible to write the glue code in C, and then call that
from the C++.
Jun 10 2002
parent Andy Walker <Andy_member pathlink.com> writes:
In article <ae2ri6$9a3$1 digitaldaemon.com>, Walter says...
Greg Comeau sells a C++ -> C translator (Comeau C++) that may be useful. I
don't think the C output will be edittable, but since C++ can interface to
C, it should be possible to write the glue code in C, and then call that
from the C++.
Thank you. I will look at that. Andy Walker
Jun 11 2002