www.digitalmars.com         C & C++   DMDScript  

D - Project looks very promising - Lets get a compiler out fast :)

reply "Michael Gaskins" <mbgaski clemson.edu> writes:
The language specifications look very good to me (I do mostly Java coding
right now but I've also done a fair amount of C and just a tad of C++).
Lets get a compiler for this thing (alpha level, beta level, whatever) to
starve off all the 'vaporware' shouters that will surely come.
Aug 16 2001
next sibling parent reply "Robert W. Cunningham" <rwc_2001 yahoo.com> writes:
Michael Gaskins wrote:

 The language specifications look very good to me (I do mostly Java coding
 right now but I've also done a fair amount of C and just a tad of C++).
 Lets get a compiler for this thing (alpha level, beta level, whatever) to
 starve off all the 'vaporware' shouters that will surely come.

Of possibly greater importance is a language test suite, usable both for regression testing of the real compiler, and for uncovering and explaining thorny language issues by using *real* examples (rather than guesstimates of what the D language spec *really* means). And to make such a suite useful useful, we need a tool that will recognize valid D programs, even if we can't compile them to machine code and execute them. Is D LALR? If so, then let's use LEX/YACC (or Flex/Bison) to whip up a quick parser that simply outputs state information (and possibly some crude C equivalents instead of machine code). At a minimum, the parser could be a binary function that accepts text, where the return value would simply indicate parse success (a valid program). Semantic actions could be used to handle many non-LALR language features, as well as documenting violations. Such a tool would allow everyone to learn and write D (and develop regression suites) long before anything close to a full D compiler is ready. It may even help if code generation is postponed until after the grammar and feature set has stabilized. I remember writing a compiler for the VAX (yes, decades ago) for an obscure proprietary language. Our 4 person team focused on getting the grammar parsing right first. When we finally knew we were properly handling all the key test cases, the team then split to implement code generation, modularity, compiler options, proper error handling, and many related items. After spending two months as a team on the parser, I was then able to write an optimized code generator on my own in a little over a month. (Well, the VAX instruction set and machine architecture was a dream to work with - the best of the CISC CPUs.) With a known-good parser, everything else seems vastly easier. -BobC
Aug 16 2001
next sibling parent reply "Walter" <walter digitalmars.com> writes:
D is designed to be easy to parse. The semantic routines are a little
trickier.

The only real trick in the parser is distinguishing a cast from a
parenthesized expression,
and a declaration from a statement.

I considered an alternate syntax for these to make them easier to parse, but
it just doesn't look right. I'm too used to C, I guess. For example:

    cast(int) expr

instead of:

    (int) expr

and:

    var int foo;

instead of:

    int foo;

-Walter

Robert W. Cunningham wrote in message <3B7CB19C.D1C766B1 yahoo.com>...
Michael Gaskins wrote:

 The language specifications look very good to me (I do mostly Java coding
 right now but I've also done a fair amount of C and just a tad of C++).
 Lets get a compiler for this thing (alpha level, beta level, whatever) to
 starve off all the 'vaporware' shouters that will surely come.

Of possibly greater importance is a language test suite, usable both for regression testing of the real compiler, and for uncovering and explaining thorny language issues by using *real* examples (rather than guesstimates

what the D language spec *really* means).

And to make such a suite useful useful, we need a tool that will recognize
valid D programs, even if we can't compile them to machine code and execute
them.

Is D LALR?  If so, then let's use LEX/YACC (or Flex/Bison) to whip up a
quick parser that simply outputs state information (and possibly some crude
C equivalents instead of machine code).  At a minimum, the parser could be

binary function that accepts text, where the return value would simply
indicate parse success (a valid program).  Semantic actions could be used

handle many non-LALR language features, as well as documenting violations.

Such a tool would allow everyone to learn and write D (and develop
regression suites) long before anything close to a full D compiler is
ready.  It may even help if code generation is postponed until after the
grammar and feature set has stabilized.

I remember writing a compiler for the VAX (yes, decades ago) for an obscure
proprietary language.  Our 4 person team focused on getting the grammar
parsing right first.  When we finally knew we were properly handling all

key test cases, the team then split to implement code generation,
modularity, compiler options, proper error handling, and many related
items.  After spending two months as a team on the parser, I was then able
to write an optimized code generator on my own in a little over a month.
(Well, the VAX instruction set and machine architecture was a dream to work
with - the best of the CISC CPUs.)

With a known-good parser, everything else seems vastly easier.


-BobC

Aug 18 2001
next sibling parent reply Axel Kittenberger <axel dtone.org> writes:
 
     cast(int) expr
 
 instead of:
 
     (int) expr
 

How about: <int> expr Greater-lesser brackets are widly accepted to surrong types. This not only eases parsing for the compiler, for my eyes it also eases reading the source for humans.
Aug 18 2001
parent reply Christophe de Dinechin <descubes earthlink.net> writes:
Axel Kittenberger wrote:

     cast(int) expr

 instead of:

     (int) expr

How about: <int> expr Greater-lesser brackets are widly accepted to surrong types. This not only eases parsing for the compiler, for my eyes it also eases reading the source for humans.

<sarcastic>Oh, yes, let's do the same mistake with angle brackets that was done in C++, angle bracket just look so great.</sarcastic> Now, please parse for me: if (x < < v > <o> 3) and compare it to if (x << v > <o> 3) Oh, and what if o is a type? Is not a type? Right, so readable :-) Christophe
Aug 18 2001
parent reply Axel Kittenberger <axel dtone.org> writes:
scarcasm is not good, no need to dig that out early.

     if (x < < v > <o> 3)

 and compare it to
     if (x << v > <o> 3)

Leaving away brackets is no way where they are supposed is no way to argue languages. I could talk very wiered english too thats still has valid grammar, that doesn't make english a bad language. We could also start to discuss what a = b+++c; yields, or if a = b++++c is valid syntax, same with the old tangling else problem. - if (x < (<v><o> 3)) is better. and so is: if ((x << v) > (<o> 3)) I see this at least as good as it would be tradionally: if (x < ((v)(o) 3)) and if ((x << v) > ((o) 3)) And honestly typecasts and comperasion is not something you'll encounter every day in the same line. Normally one casts objects and compares integer types. Yes, I know there are cases where you have to compare signed with unsigned integers, there you need a typecast in the same line, but normally if you decide types right this can be avoided.
 Oh, and what if o is a type? Is not a type?

I don't understand this. - Axel
Aug 18 2001
parent reply "Rajiv Bhagwat" <dataflow vsnl.com> writes:
Hey guys,

As an aside, did you know that Walter once won an 'obfuscated c' contest?

We don't want 'D' to be a language for which such contests are held<g>!
-- Rajiv

Axel Kittenberger <axel dtone.org> wrote in message
news:9ll905$7vl$1 digitaldaemon.com...
 scarcasm is not good, no need to dig that out early.

     if (x < < v > <o> 3)

 and compare it to
     if (x << v > <o> 3)

Leaving away brackets is no way where they are supposed is no way to argue languages. I could talk very wiered english too thats still has valid grammar, that doesn't make english a bad language. We could also start to discuss what a = b+++c; yields, or if a = b++++c is valid syntax, same with the old tangling else problem. - if (x < (<v><o> 3)) is better. and so is: if ((x << v) > (<o> 3)) I see this at least as good as it would be tradionally: if (x < ((v)(o) 3)) and if ((x << v) > ((o) 3)) And honestly typecasts and comperasion is not something you'll encounter every day in the same line. Normally one casts objects and compares

 types. Yes, I know there are cases where you have to compare signed with
 unsigned integers, there you need a typecast in the same line, but

 if you decide types right this can be avoided.

 Oh, and what if o is a type? Is not a type?

I don't understand this. - Axel

Aug 18 2001
parent reply "Walter" <walter digitalmars.com> writes:
Rajiv Bhagwat wrote in message <9llau3$99d$1 digitaldaemon.com>...
Hey guys,

As an aside, did you know that Walter once won an 'obfuscated c' contest?

My dirty secret is out!
We don't want 'D' to be a language for which such contests are held<g>!

As one wag once said, you can write FORTRAN in any language.
Aug 18 2001
parent John Fletcher <J.P.Fletcher aston.ac.uk> writes:
Walter wrote:

 Rajiv Bhagwat wrote in message <9llau3$99d$1 digitaldaemon.com>...
Hey guys,

As an aside, did you know that Walter once won an 'obfuscated c' contest?

My dirty secret is out!
We don't want 'D' to be a language for which such contests are held<g>!

As one wag once said, you can write FORTRAN in any language.

One of my research students wrote a FORTRAN program which wrote FORTRAN. John
Aug 21 2001
prev sibling parent reply Russell Bornschlegel <kaleja estarcion.com> writes:
Walter wrote:
 The only real trick in the parser is distinguishing a cast from a
 parenthesized expression,
 and a declaration from a statement.
 
 I considered an alternate syntax for these to make them easier to parse, but
 it just doesn't look right. I'm too used to C, I guess. For example:
 
     cast(int) expr
 
 instead of:
 
     (int) expr
 

This I could live with. Is the C++ style cast of the form: int(expr) difficult to parse?
 and:
 
     var int foo;
 
 instead of:
 
     int foo;

This one hurts my C-brain a bit more. :)
Aug 18 2001
parent reply "Walter" <walter digitalmars.com> writes:
Russell Bornschlegel wrote in message <3B7EC050.A74713DA estarcion.com>...
Is the C++ style cast of the form:

   int(expr)

difficult to parse?

Yes, because the grammar is too ambiguous. A goal of D is to make parsing independent of the symbol table. Can't do that and support C++ style casts.
Aug 18 2001
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
The only other problem with C++ style casts is that it requires making
typedefs, without them you can't cast to int* for example.  Pascal had the
same problem.  Not a huge problem, admittedly.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:9lml51$10vj$3 digitaldaemon.com...
 Russell Bornschlegel wrote in message <3B7EC050.A74713DA estarcion.com>...
Is the C++ style cast of the form:

   int(expr)

difficult to parse?

Yes, because the grammar is too ambiguous. A goal of D is to make parsing independent of the symbol table. Can't do that and support C++ style

Nov 03 2001
parent reply "Walter" <walter digitalmars.com> writes:
I'm still in a bit of a quandary about casts. Should casting be:

    (type)expression

or:

    cast(type)expression

? The latter is far easier to parse. I'm so used to the former I am
reluctant to give it up. A kludgy compromise might be allowing the former
only for types that start with a keyword, not a typedef.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9s2cep$1ved$1 digitaldaemon.com...
 The only other problem with C++ style casts is that it requires making
 typedefs, without them you can't cast to int* for example.  Pascal had the
 same problem.  Not a huge problem, admittedly.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:9lml51$10vj$3 digitaldaemon.com...
 Russell Bornschlegel wrote in message


Is the C++ style cast of the form:

   int(expr)

difficult to parse?

Yes, because the grammar is too ambiguous. A goal of D is to make


 independent of the symbol table. Can't do that and support C++ style


Dec 15 2001
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Walter" <walter digitalmars.com> wrote in message
news:9vhegn$1l01$1 digitaldaemon.com...

 I'm still in a bit of a quandary about casts. Should casting be:

     (type)expression

 or:

     cast(type)expression

 ? The latter is far easier to parse. I'm so used to the former I am
 reluctant to give it up. A kludgy compromise might be allowing the former
 only for types that start with a keyword, not a typedef.

Personally, I've found myself using the cast() form. But the compromise you've mentioned of seems fine as well.
Dec 16 2001
prev sibling next sibling parent reply Axel Kittenberger <axel dtone.org> writes:
Walter wrote:

 I'm still in a bit of a quandary about casts. Should casting be:
 
     (type)expression
 
 or:
 
     cast(type)expression

My I brainstorm a bit? personally I like the: <type> expression Syntax, and it is easy parseable in a LALR syntax, as it requires only one look ahead. However I've seen people having feelings against it. How about using the right apostrophe? Is it used it already by something different? `type` expression Well however I thing trying to see pure, type casting can best be viewed as a kind of function or? It gets one input value, and returns another output value, sometimes different as the input. Then the above expression should be look like this better: cast(type, expression) or the pascal form type(expression) But types cannot normally be function paramteres, they are something different. I think type casting should use the same syntax as generic programming in the same language should do. It's the same paradigm, calling a function whose implementation is dependant on the type you specify. How about then: cast<type>(expression) ?
Dec 16 2001
parent "Pavel Minayev" <evilone omen.ru> writes:
"Axel Kittenberger" <axel dtone.org> wrote in message
news:9vhomg$1re4$1 digitaldaemon.com...
 Walter wrote:

 I'm still in a bit of a quandary about casts. Should casting be:

     (type)expression

 or:

     cast(type)expression

My I brainstorm a bit? personally I like the: <type> expression Syntax, and it is easy parseable in a LALR syntax, as it requires only one look ahead. However I've seen people having feelings against it.

It's also hard to distinguish from normal expression containing < and > operands.
 How about using the right apostrophe? Is it used it already by something
 different?
 `type` expression

 Well however I thing trying to see pure, type casting can best be viewed

 a kind of function or? It gets one input value, and returns another output
 value, sometimes different as the input. Then the above expression should
 be look like this better:

 cast(type, expression)

It doesn't seem "right" to me =)
 or the pascal form
 type(expression)

This is, IMHO, acceptable, since there's no temporaries in D, so the syntax is unused.
 But types cannot normally be function paramteres, they are something
 different. I think type casting should use the same syntax as generic
 programming in the same language should do. It's the same paradigm,

 a function whose implementation is dependant on the type you specify.
 How about then:

 cast<type>(expression)

Back to C++ days... I just hate angle brackets! And anyhow, what's the problem with the way it's done now?
Dec 16 2001
prev sibling next sibling parent reply la7y6nvo shamko.com writes:
"Walter" <walter digitalmars.com> writes:

 I'm still in a bit of a quandary about casts. Should casting be:
 
     (type)expression
 
 or:
 
     cast(type)expression
 
 ? The latter is far easier to parse. I'm so used to the former I am
 reluctant to give it up. A kludgy compromise might be allowing the former
 only for types that start with a keyword, not a typedef.

Please let D abandon the abominable syntax that C uses for casts. I know, I know, people are used to it, but this is one case where compatibility should be given second place to forward progress. On the compromise idea - If a new syntax is introduced then the old syntax should be eliminated. C++ made the mistake of adding a new syntax for casting while also retaining the old, and that's been a source of confusion. Also, having two forms which are sometimes but not always interchangeable is asking for trouble. Again, I know about the compatibility arguments, but let's try to make some forward progress with D in this area. As for what syntax to use.. One thing about C that's always seemed rather poorly thought out is the funny way that prefix operators and postfix operators interact. (Incidentally, C++ retains these problems and makes them worse with a prefix 'new' operator. But I digress.) If one thinks of casting as a kind of operator - admittedly one that doesn't usually execute any instructions - I think it makes more sense to put the cast operator after the operand, as for example operand.as(type) or if necessary (expression).as(type) If the syntax for expressions were expanded to include a syntax for types, with semantics allowing some run-time representation for values of type Type (which seems like a good idea independent of casting), then no special syntax is needed for casting. Indeed if this were so then the only difference between the casting 'as' operators and regular (method) functions is that 'as' is overloaded on return type. Note how nicely this functional form cascades: x[i].as(Window).redraw() Compare that to: (cast(Window) x[i]).redraw() Perhaps it's my many years of using object-oriented programming languages, but the first form seems much easier to understand than the second. Let me add a couple of disclaimers as anti-flame insurance :) (1) There is a fundamental and important difference between the notion of type and the notion of class. The comments above gloss over that distinction. It's important that some implementation along these lines not do that. (2) C uses the same syntax for "casting" (like changing one pointer type into another with no bit changes) and "conversion" (changing an integer value into a floating point value, bits definitely change). It seems obvious that these notions, although related, are different operations and should have distinct syntaxes. Or at least different operator names. Hopefully the comments and suggestions made above find some resonance amongst the readers of the newsgroup and potential users of D.
Dec 16 2001
parent reply Axel Kittenberger <axel dtone.org> writes:
 
     (2) C uses the same syntax for "casting" (like changing one pointer
     type into another with no bit changes) and "conversion" (changing an
     integer value into a floating point value, bits definitely change).
     It seems obvious that these notions, although related, are different
     operations and should have distinct syntaxes.  Or at least different
     operator names.

True, in fact there are 4 casts I know of, conversion cast, upcast, downcast and reinterpret cast. C++ decides in after the context what to use that can be very error prone. Where a const cast is only a finer issue I do not count as a seperate cast form. class Borg b* = (Corg) a*; Can have different code results dependant that you included #include "Corg.h" and defined the class with it. Took me once several days to find a bug, where the code did an upcast (pointer should decrement by casting) but the class defintion was not included in that file, so he did a reinterpred cast, pointing at nonsense at the end. - Axel
Dec 16 2001
parent "Walter" <walter digitalmars.com> writes:
"Axel Kittenberger" <axel dtone.org> wrote in message
news:9vj699$2l18$1 digitaldaemon.com...
 True, in fact there are 4 casts I know of, conversion cast, upcast,
 downcast and reinterpret cast. C++ decides in after the context what to

 that can be very error prone. Where a const cast is only a finer issue I

 not count as a seperate cast form.

 class Borg b* = (Corg) a*;

 Can have different code results dependant that you included #include
 "Corg.h" and defined the class with it. Took me once several days to find

 bug, where the code did an upcast (pointer should decrement by casting)

 the class defintion was not included in that file, so he did a reinterpred
 cast, pointing at nonsense at the end.

D shouldn't suffer from that problem, as there shouldn't be any forward referenced class names. D will just use one cast form, not the 4 different ones. (There is no const cast in D, because there is no const type modifier.) To do a type paint, cast it to void* first and then cast the result.
Dec 16 2001
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:9vhegn$1l01$1 digitaldaemon.com...
 I'm still in a bit of a quandary about casts. Should casting be:

     (type)expression

 or:

     cast(type)expression

 ? The latter is far easier to parse. I'm so used to the former I am
 reluctant to give it up. A kludgy compromise might be allowing the former
 only for types that start with a keyword, not a typedef.

There's a related issue for declarations, is: a * b; a declaration or an expression? Currently, the ambiguity is resolved with the rule "if it will parse as a declaration, it is a declaration". I'm not particularly thrilled with that, as it requires lookahead in the parser. One solution is to require a "var" keyword in front of declarations. But to my eyes, typing all those var's in is annoying: void func() { var int i,j; var X* y; .... }
Dec 16 2001
next sibling parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Walter" <walter digitalmars.com> wrote in message
news:9vj9l0$2n1j$2 digitaldaemon.com...
 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved with
 the rule "if it will parse as a declaration, it is a declaration". I'm not
 particularly thrilled with that, as it requires lookahead in the parser.

If "a" is type in current scope, it's a declaration; If "a" is something else, it's an expression, so: class Apple { } int main() { Apple * x; // x is pointer to apple int Apple; Apple * y; // multiply Apple by y }
Dec 16 2001
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:9vk5gl$4hk$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:9vj9l0$2n1j$2 digitaldaemon.com...
 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved


 the rule "if it will parse as a declaration, it is a declaration". I'm


 particularly thrilled with that, as it requires lookahead in the parser.

If "a" is type in current scope, it's a declaration; If "a" is something else, it's an expression, so: class Apple { } int main() { Apple * x; // x is pointer to apple int Apple; Apple * y; // multiply Apple by y }

The trouble there is I am trying to separate the syntactic from the semantic. Recognizing that an identifier is a type requires semantic analysis (i.e. building a symbol table). The separation of the two functions will make it easy to write things like source code formatters and analyzers.
Dec 17 2001
next sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
I guess you can't have everything.  Either that or you have the coder
specify 'var' in front of all declarations.  I for one wouldn't mind that,
since it clears up so many other things.  I did just that in fact when
programming Pascal for all those years.  I was far more upset about having
to type 'begin' and 'end' and 'then' and 'do' all over the place than I ever
was about telling the compiler I'm about to do a variable declaration.

If you ask me, anyone writing a source code analyzer or formatter can easily
check for variable declarations and typedefs so it knows what symbols are
what at current scope.  It doesn't even really need to keep track of the
exact type, just that a variable or type has been declared there with that
name.  Source beautifiers builtin to the IDE is something I'd really love to
see, so if doing that means the IDE has to keep track of declarations, well,
that's nothing VB or VC hasn't been doing for many years now already.  Hell,
Instant Pascal way back on the Apple II did that.  So did Think Pascal on
the Mac.  Then again Pascal used the var keyword.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:9vk9nj$8eq$3 digitaldaemon.com...
 "Pavel Minayev" <evilone omen.ru> wrote in message
 news:9vk5gl$4hk$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:9vj9l0$2n1j$2 digitaldaemon.com...
 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved


 the rule "if it will parse as a declaration, it is a declaration". I'm


 particularly thrilled with that, as it requires lookahead in the



 One

 If "a" is type in current scope, it's a declaration;
 If "a" is something else, it's an expression, so:

     class Apple { }

     int main()
     {
         Apple * x;    // x is pointer to apple
         int Apple;
         Apple * y;    // multiply Apple by y
     }

The trouble there is I am trying to separate the syntactic from the semantic. Recognizing that an identifier is a type requires semantic analysis (i.e. building a symbol table). The separation of the two

 will make it easy to write things like source code formatters and

Dec 17 2001
parent "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9vke8m$amq$1 digitaldaemon.com...
 I guess you can't have everything.  Either that or you have the coder
 specify 'var' in front of all declarations.  I for one wouldn't mind that,
 since it clears up so many other things.  I did just that in fact when
 programming Pascal for all those years.  I was far more upset about having
 to type 'begin' and 'end' and 'then' and 'do' all over the place than I

 was about telling the compiler I'm about to do a variable declaration.

Having the "if it parses as a declaration, it is a declaration" rule does seem to work.
 If you ask me, anyone writing a source code analyzer or formatter can

 check for variable declarations and typedefs so it knows what symbols are
 what at current scope.  It doesn't even really need to keep track of the
 exact type, just that a variable or type has been declared there with that
 name.  Source beautifiers builtin to the IDE is something I'd really love

 see, so if doing that means the IDE has to keep track of declarations,

 that's nothing VB or VC hasn't been doing for many years now already.

 Instant Pascal way back on the Apple II did that.  So did Think Pascal on
 the Mac.  Then again Pascal used the var keyword.

VC has had millions of man-hours of development in it - and with access to a full blown compiler, you can tell if an identifier is a type or not. To do so in D, you'd have to find all the imports, parse them all, etc., just like in a full blown compiler to be able to tell with certainty if it's a type or not. You can fake it and work 93% of the time, but getting that last 7% right requires the whole thing. Most C beautifers that aren't hooked into a full blown compiler get it right about 93% of the time. Heck, many of them don't even handle the \ line splice completely correctly. With D, the idea is it can be gotten 100% correct with only a modest effort by one person.
Dec 17 2001
prev sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Walter" <walter digitalmars.com> wrote in message
news:9vk9nj$8eq$3 digitaldaemon.com...

 The trouble there is I am trying to separate the syntactic from the
 semantic. Recognizing that an identifier is a type requires semantic
 analysis (i.e. building a symbol table). The separation of the two

 will make it easy to write things like source code formatters and

Yes, I see. Personally, I woudln't mind typing "var" here and there, got quite used to it in Pascal and then UnrealScript...
Dec 17 2001
prev sibling parent reply "Roberto Mariottini" <rmariottini lycosmail.com> writes:
"Pavel Minayev" <evilone omen.ru> ha scritto nel messaggio
news:9vk5gl$4hk$1 digitaldaemon.com...
 If "a" is type in current scope, it's a declaration;
 If "a" is something else, it's an expression, so:

     class Apple { }

     int main()
     {
         Apple * x;    // x is pointer to apple
         int Apple;
         Apple * y;    // multiply Apple by y
     }

Please, save me from this! ;-)
Dec 17 2001
parent reply "Walter" <walter digitalmars.com> writes:
"Roberto Mariottini" <rmariottini lycosmail.com> wrote in message
news:9vkbg9$9d5$1 digitaldaemon.com...
 "Pavel Minayev" <evilone omen.ru> ha scritto nel messaggio
 news:9vk5gl$4hk$1 digitaldaemon.com...
 If "a" is type in current scope, it's a declaration;
 If "a" is something else, it's an expression, so:

     class Apple { }

     int main()
     {
         Apple * x;    // x is pointer to apple
         int Apple;
         Apple * y;    // multiply Apple by y
     }

Please, save me from this! ;-)

Currently, D issues the following error message for the Apple*y: symbol 'Apple' is not a type
Dec 17 2001
parent reply Axel Kittenberger <axel dtone.org> writes:
Walter wrote:

 
 "Roberto Mariottini" <rmariottini lycosmail.com> wrote in message
 news:9vkbg9$9d5$1 digitaldaemon.com...
 "Pavel Minayev" <evilone omen.ru> ha scritto nel messaggio
 news:9vk5gl$4hk$1 digitaldaemon.com...
 If "a" is type in current scope, it's a declaration;
 If "a" is something else, it's an expression, so:

     class Apple { }

     int main()
     {
         Apple * x;    // x is pointer to apple
         int Apple;



In my opinion 'int Apple' should better raise an error if Apple is defined as type. Shadowing is nice in theory, and looks cool in compiler implementation, but is bad in practice, as it is error prone for the programmer who easy mismatches a variable/type when it's understanding differes from context.
Dec 17 2001
next sibling parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Hear, hear.

Axel Kittenberger wrote:

 Walter wrote:

 "Roberto Mariottini" <rmariottini lycosmail.com> wrote in message
 news:9vkbg9$9d5$1 digitaldaemon.com...
 "Pavel Minayev" <evilone omen.ru> ha scritto nel messaggio
 news:9vk5gl$4hk$1 digitaldaemon.com...
 If "a" is type in current scope, it's a declaration;
 If "a" is something else, it's an expression, so:

     class Apple { }

     int main()
     {
         Apple * x;    // x is pointer to apple
         int Apple;



In my opinion 'int Apple' should better raise an error if Apple is defined as type. Shadowing is nice in theory, and looks cool in compiler implementation, but is bad in practice, as it is error prone for the programmer who easy mismatches a variable/type when it's understanding differes from context.

-- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Dec 17 2001
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Think, Russ!  Think, then speak!

I do want to say that I agree with Axel that "shadowing" is not a good
thing...maybe even a Bad Thing.  But now that I've thought about it, I've
realized that it still doesn't solve the parser problem...the parser has no way
of knowing if it's a valid statement or not.

For that, what about just making no-effect lines to be syntax errors (except for
null statements)?  Then any thing that *could* be a declaration (regardless of
context) *must* be one.

--
The Villagers are Online! villagersonline.com

.[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ]
.[ (a version.of(English).(precise.more)) is(possible) ]
?[ you want.to(help(develop(it))) ]
Dec 17 2001
next sibling parent "Walter" <walter digitalmars.com> writes:
"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:3C1E6A51.E60BF91A deming-os.org...
 For that, what about just making no-effect lines to be syntax errors

 null statements)?  Then any thing that *could* be a declaration

 context) *must* be one.

Hmm. That is a good thought. The no effect error is an irritant in C++, as I use macros that depend on the compiler optimizing away no effect expressions, as in: #define printf 1 || printf and: #define assert(e) 0 but in D these become irrelevant, so maybe no effect errors are practical.
Dec 17 2001
prev sibling parent reply Axel Kittenberger <axel dtone.org> writes:
Russ Lewis wrote:

 Think, Russ!  Think, then speak!
 
 I do want to say that I agree with Axel that "shadowing" is not a good
 thing...maybe even a Bad Thing.  But now that I've thought about it, I've
 realized that it still doesn't solve the parser problem...the parser has
 no way of knowing if it's a valid statement or not.
 
 For that, what about just making no-effect lines to be syntax errors
 (except for
 null statements)?  Then any thing that *could* be a declaration
 (regardless of context) *must* be one.

Not necessarly true, the lexer can do type lookups, and return differnt tokens as i.e. mine did. That how I solved the grammatical problem how to differentiate ie. asdf[2][2] a; asdf[2][2] = a; Is "asdf" meant to be the type (declaring an 2x2 array of it) or a identifier accessing the 2/2th element of it? Yes, it is technically distinguishable but not with 1 token lookahead, or with any definitive size of lookahead. To distingish "asdf" between type or variable you need 3 * field dimenions look ahead, where the dimension is normally unlimited. - Axel
Dec 18 2001
parent "Walter" <walter digitalmars.com> writes:
"Axel Kittenberger" <axel dtone.org> wrote in message
news:9voc5a$2vrg$1 digitaldaemon.com...
 Not necessarly true, the lexer can do type lookups, and return differnt
 tokens as i.e. mine did. That how I solved the grammatical problem how to
 differentiate ie.

 asdf[2][2] a;
 asdf[2][2] = a;

 Is "asdf" meant to be the type (declaring an 2x2 array of it) or a
 identifier accessing the 2/2th element of it? Yes, it is technically
 distinguishable but not with 1 token lookahead, or with any definitive

 of lookahead. To distingish "asdf" between type or variable you need 3 *
 field dimenions look ahead, where the dimension is normally unlimited.

 - Axel

That's why I wound up implementing arbitrary lookahead. -Walter
Dec 18 2001
prev sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Axel Kittenberger" <axel dtone.org> wrote in message
news:9vlm2o$1389$1 digitaldaemon.com...

 In my opinion 'int Apple' should better raise an error if Apple is defined
 as type. Shadowing is nice in theory, and looks cool in compiler
 implementation, but is bad in practice, as it is error prone for the
 programmer who easy mismatches a variable/type when it's understanding
 differes from context.

This might be true if Apple was a local class. But what if it resides in other module? In this case, disallowing shadowing would result in breaking code in one module when other gets new type in it - definitely unacceptable...
Dec 17 2001
prev sibling parent reply "Roberto Mariottini" <rmariottini lycosmail.com> writes:
"Walter" <walter digitalmars.com> ha scritto nel messaggio
news:9vj9l0$2n1j$2 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:9vhegn$1l01$1 digitaldaemon.com...
 I'm still in a bit of a quandary about casts. Should casting be:

     (type)expression

 or:

     cast(type)expression

 ? The latter is far easier to parse. I'm so used to the former I am
 reluctant to give it up. A kludgy compromise might be allowing the


 only for types that start with a keyword, not a typedef.


I prefer the keyword way. It's always easier to find a cast in a million-operand expression (like one of my co-workers does ever and ever) if the editor highlight the keyword in the middle of it. As for the syntax, I once thougth about it for days, without finding a good solution. I think the type should be separated from the expression for clearity. So (type)cast(expression) is better for me (though ugly...), or cast[type](expression) or (type)##(expression), etc... I still don't have the right idea :-(
 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved with
 the rule "if it will parse as a declaration, it is a declaration". I'm not
 particularly thrilled with that, as it requires lookahead in the parser.

 solution is to require a "var" keyword in front of declarations. But to my
 eyes, typing all those var's in is annoying:

     void func()
     {    var int i,j;
          var X* y;
         ....
     }

This is another thing I always thought of. Again, I found no better way. The fact is that typing 'var' or something at every declaration is abig waste of typing. But: var { int i; X* j; } // <-- this should not limit the scope :-( i : int; j : X*; // good ole Pascal! But a : int = 6 * K; // where's the initializer? after the type :-( # int i; # X* j; An operator is not always better than a keyword :-( Anything anyone?
Dec 17 2001
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
I just thought of something.  Yes, typing var is kinda annoying.  But how
about if one could type the keyword 'var' fairly infrequently, such as:

var:
   int a;
   int b;
   Foo myfoo;

Or perhaps:

var
{
   int a;
   int b;
   Foo myfoo;
}

Or even perhaps this (I think I like this one best):

var int a, int b, Foo myfoo; // can declare multiple vars of different types
with one var keyword

or

var int a, b, Foo myfoo, c; // b is also an int, but myfoo and c are both
Foo's

Whaddya think?  Maybe we could subsitute 'public' or 'private' for the 'var'
keyword instead?

public  // statements can't go inside public blocks.
{
  int a, b;
  Foo myfoo;
}

public:  // begin a public declara
  int i, j;
for (i=0; i<5; ++i)   // this statement terminates the previous public
section.  However here we run into the original problem again.  Perhaps
disallow the 'label:' style attribute format?
{
   j += i;
}
public:
  int c;

Sean

 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved


 the rule "if it will parse as a declaration, it is a declaration". I'm


 particularly thrilled with that, as it requires lookahead in the parser.

 solution is to require a "var" keyword in front of declarations. But to


 eyes, typing all those var's in is annoying:

     void func()
     {    var int i,j;
          var X* y;
         ....
     }

This is another thing I always thought of. Again, I found no better way. The fact is that typing 'var' or something at every declaration is abig waste of typing. But: var { int i; X* j; } // <-- this should not limit the scope :-( i : int; j : X*; // good ole Pascal! But a : int = 6 * K; // where's the initializer? after the type :-( # int i; # X* j; An operator is not always better than a keyword :-( Anything anyone?

Dec 17 2001
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9vkeno$b0s$1 digitaldaemon.com...

 Or perhaps:

 var
 {
    int a;
    int b;
    Foo myfoo;
 }

Sounds good.
 Or even perhaps this (I think I like this one best):

 var int a, int b, Foo myfoo; // can declare multiple vars of different

 with one var keyword

 or

 var int a, b, Foo myfoo, c; // b is also an int, but myfoo and c are both
 Foo's

Yeah... looks somewhat like VB.NET declarations, but I like the idea. And should have no problem being parsed.
 Whaddya think?  Maybe we could subsitute 'public' or 'private' for the

 keyword instead?

"local"?
Dec 17 2001
prev sibling parent "Walter" <walter digitalmars.com> writes:
It's an intriguing idea. The : version won't work, though, as you'd need
some other keyword to turn it off when the statements begin.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9vkeno$b0s$1 digitaldaemon.com...
 I just thought of something.  Yes, typing var is kinda annoying.  But how
 about if one could type the keyword 'var' fairly infrequently, such as:

 var:
    int a;
    int b;
    Foo myfoo;

 Or perhaps:

 var
 {
    int a;
    int b;
    Foo myfoo;
 }

 Or even perhaps this (I think I like this one best):

 var int a, int b, Foo myfoo; // can declare multiple vars of different

 with one var keyword

 or

 var int a, b, Foo myfoo, c; // b is also an int, but myfoo and c are both
 Foo's

 Whaddya think?  Maybe we could subsitute 'public' or 'private' for the

 keyword instead?

 public  // statements can't go inside public blocks.
 {
   int a, b;
   Foo myfoo;
 }

 public:  // begin a public declara
   int i, j;
 for (i=0; i<5; ++i)   // this statement terminates the previous public
 section.  However here we run into the original problem again.  Perhaps
 disallow the 'label:' style attribute format?
 {
    j += i;
 }
 public:
   int c;

 Sean

 There's a related issue for declarations, is:

     a * b;

 a declaration or an expression? Currently, the ambiguity is resolved


 the rule "if it will parse as a declaration, it is a declaration". I'm


 particularly thrilled with that, as it requires lookahead in the



 One
 solution is to require a "var" keyword in front of declarations. But



 my
 eyes, typing all those var's in is annoying:

     void func()
     {    var int i,j;
          var X* y;
         ....
     }

This is another thing I always thought of. Again, I found no better way. The fact is that typing 'var' or something at every declaration is abig waste of typing. But: var { int i; X* j; } // <-- this should not limit the scope :-( i : int; j : X*; // good ole Pascal! But a : int = 6 * K; // where's the initializer? after the type :-( # int i; # X* j; An operator is not always better than a keyword :-( Anything anyone?


Dec 17 2001
prev sibling next sibling parent a <a b.c> writes:
Walter wrote:
 
 I'm still in a bit of a quandary about casts. Should casting be:
 
     (type)expression
 
 or:
 
     cast(type)expression
 
 ? The latter is far easier to parse. I'm so used to the former I am
 reluctant to give it up. A kludgy compromise might be allowing the former
 only for types that start with a keyword, not a typedef.

If you ever allow any form of generic programming (templates, etc.) this could come back and haunt you. Also, it may cause unnecessary breakage if some one switches code from using int to some FOO_t sort of datatype. You've gone to such extremes to do things one way (at least in the first release) that is seems odd to have two ways for this. I don't like the way the second form looks like a function call followed by an expression, but I can't give a good justification for the identifier in parens followed by an expression either. Go with the cast keyword and keep it clean. Dan
Dec 16 2001
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Walter wrote:

 I'm still in a bit of a quandary about casts. Should casting be:
 
     (type)expression
 
 or:
 
     cast(type)expression
 
...

Casts are one of my least liked features of C and C++. Better is a language design that, to the greatest extent possible, eliminates the need for casts. Still, given the occasional need, I prefer the form: cast(type, expression) Or even: cast("type", expression) though that seems a bit strange. But so does the idea of passing a type as an argument (unless the language should be extended to make that a "normal" activity). The thing that bothers me about it is that in this expression the type appears to be a parameter, but normally types aren't allowed to be parameters, and this feels ... unnatural.
Jan 02 2002
parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Charles Hixson wrote:

 Walter wrote:
 
 I'm still in a bit of a quandary about casts. Should casting be:

     (type)expression

 or:

     cast(type)expression

 ...


On reading other posts I encountered the proposal to use the syntax: expression.as(type) This syntax would be quite clear, but seems to be problematic as it would require that a method be allowed to return a variable type, e.g.: (e.as(int) < e.as(Point)) clearly this is somewhat problematical. OTOH, this same problem is exhibited when one examines an alternate syntax, e.g.: (cast (int, e) < cast (Point, e) ) but I suppose that cast may more clearly be considered a "special form". I dislike special forms, but perhaps they are necessary unless type is a "first class" entity in the language. Now clearly either of the examples that I used should cause an error (the types int and Point being incommensurate), and this would be true no matter what the syntax. The inconsistency arises because one can usually depend on methods and functions to return one particular type (or at least a descendant of some particular type). So it "feels wrong" to have the syntax used in a contrary manner. The "cleanest way" around this that occurs to me is to eliminate the cast operation, and require the creation of a union for each dually accessed entity. But for this to be acceptable the frequency with which a cast had to be used would need to be quite low.
Jan 02 2002
parent "Pavel Minayev" <evilone omen.ru> writes:
"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:3C335780.6070609 earthlink.net...

 This syntax would be quite clear, but seems to be problematic as
 it would require that a method be allowed to return a variable
 type, e.g.:
      (e.as(int) < e.as(Point))

I don't see any problem e.as(int) returns a value of type int. e.as(Point) returns a value of type Point. Of course, as() is a built-in method then.
 The "cleanest way" around this that occurs to me is to eliminate
 the cast operation, and require the creation of a union for each
 dually accessed entity.  But for this to be acceptable the
 frequency with which a cast had to be used would need to be
 quite low.

Man have you ever tried to code staight WinAPI programs like that? Besides, D (as I understand it) is not a "cleanest" language - it's a _practical_ language. Which means that it tries to make our live easier rather than "cleaner" (the latter is important as well, of course!). Forbidding casts is definitely not the right step in this direction...
Jan 02 2002
prev sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
 Is D LALR?  If so, then let's use LEX/YACC (or Flex/Bison) to whip up a
 quick parser that simply outputs state information (and possibly some

 C equivalents instead of machine code).  At a minimum, the parser could be

 binary function that accepts text, where the return value would simply
 indicate parse success (a valid program).  Semantic actions could be used

 handle many non-LALR language features, as well as documenting violations.

If it's at all possible, I'd recommend making D have an LL(k) grammar for some small k, as this will vastly simplify the parser needed and allow alot more tools to manipulate D code. Please don't force people to use LEX/YACC to parse D. My script languages have always been LL(2) or LL(1) and I can get away with coding the parser by hand. I think Antlr generates LL(k) parsers too. Sean
Nov 03 2001
next sibling parent reply Axel Kittenberger <axel dtone.org> writes:
 If it's at all possible, I'd recommend making D have an LL(k) grammar for
 some small k, as this will vastly simplify the parser needed and allow
 alot
 more tools to manipulate D code.  Please don't force people to use
 LEX/YACC
 to parse D.  My script languages have always been LL(2) or LL(1) and I can
 get away with coding the parser by hand.  I think Antlr generates LL(k)
 parsers too.

Do you know any OpenSource LL(n) parser generators for C? - Axel -- |D) http://www.dtone.org
Nov 04 2001
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
Antlr is evidently Public Domain:

http://www.antlr.org/rights.html

Sean

"Axel Kittenberger" <axel dtone.org> wrote in message
news:9s5d02$11j9$1 digitaldaemon.com...
 If it's at all possible, I'd recommend making D have an LL(k) grammar


 some small k, as this will vastly simplify the parser needed and allow
 alot
 more tools to manipulate D code.  Please don't force people to use
 LEX/YACC
 to parse D.  My script languages have always been LL(2) or LL(1) and I


 get away with coding the parser by hand.  I think Antlr generates LL(k)
 parsers too.

Do you know any OpenSource LL(n) parser generators for C? - Axel -- |D) http://www.dtone.org

Nov 05 2001
parent brucedickey micron.com writes:
FYI:

My favorite parser-generator is ProGrammar from http://www.programmar.com/. It's
not open source or free, but it is inexpensive, LL(n), and can examine the 
parse tree in productions, making it a linear bounded automata! Alas, it does
not generate source, but a proprietary grammar file. You have to link to a
supplied .lib or use the ActiveX. Currently available for Windows, but I have an
alpha for Linux that works fine. This one builds a parse tree and provides an
API to walk it and extract data. Its for C/C++, VB, Delphi.

I've used an LALR parser-generater in the past. I would never consider using
Bison or anything that's not LL(n) again.

Bruce

In article <9s6s97$2bem$1 digitaldaemon.com>, Sean L. Palmer says...
Antlr is evidently Public Domain:

http://www.antlr.org/rights.html

Sean

"Axel Kittenberger" <axel dtone.org> wrote in message
news:9s5d02$11j9$1 digitaldaemon.com...
 If it's at all possible, I'd recommend making D have an LL(k) grammar


 some small k, as this will vastly simplify the parser needed and allow
 alot
 more tools to manipulate D code.  Please don't force people to use
 LEX/YACC
 to parse D.  My script languages have always been LL(2) or LL(1) and I


 get away with coding the parser by hand.  I think Antlr generates LL(k)
 parsers too.

Do you know any OpenSource LL(n) parser generators for C? - Axel -- |D) http://www.dtone.org


Jul 18 2002
prev sibling next sibling parent reply "Walter" <walter digitalmars.com> writes:
I can never remember the definitions of those grammars. The current D
parser, however, is pretty simple and is implemented as recursive descent. I
have little use for LEX/YACC. The code output always seems to require a
little hand editting, and then automating the build process doesn't work.

Lexers are so simple anyway I can't see a reason to use LEX.


"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9s2c77$1v6e$1 digitaldaemon.com...
 Is D LALR?  If so, then let's use LEX/YACC (or Flex/Bison) to whip up a
 quick parser that simply outputs state information (and possibly some

 C equivalents instead of machine code).  At a minimum, the parser could


 a
 binary function that accepts text, where the return value would simply
 indicate parse success (a valid program).  Semantic actions could be


 to
 handle many non-LALR language features, as well as documenting


 If it's at all possible, I'd recommend making D have an LL(k) grammar for
 some small k, as this will vastly simplify the parser needed and allow

 more tools to manipulate D code.  Please don't force people to use

 to parse D.  My script languages have always been LL(2) or LL(1) and I can
 get away with coding the parser by hand.  I think Antlr generates LL(k)
 parsers too.

 Sean

Dec 15 2001
parent reply Axel Kittenberger <axel dtone.org> writes:
Walter wrote:

 I can never remember the definitions of those grammars. The current D
 parser, however, is pretty simple and is implemented as recursive descent.
 I have little use for LEX/YACC. The code output always seems to require a
 little hand editting, and then automating the build process doesn't work.

Well the today active GNU projects are called "flex" and "bison". Bison requires hand editting? Thats not true in my eyes. First primary bison creates only the parser tables, not much more not much less. If you want to make changes in the parser, you can change the bison.simple template. Once for all. There are some limits in bison, as it doesn't allow you to specify how the actions are coded, which is interesting if you are doing non-C parsers. However I agree that hand writing a parser can have it's advantages. It's only too complicated for a small brain like mine :o) I personally never got unary operators right this way, the miny languages I once wrote the parser by hand quickly came out of my control, while bison parsers seems to be pretty easy you got it, however sometimes configuring out how to resolve some shift/reduce conflicuts can be as tedious.
 Lexers are so simple anyway I can't see a reason to use LEX.

I totally agree on that. However it personally confuses me somewhat why there are all two different tools for spelling and for grammar, after all it is in principle the same or? Only once your tokens are the alphabet and once your tokens are the words outputed by the lexer grammar. - Axel
Dec 16 2001
parent reply "Walter" <walter digitalmars.com> writes:
"Axel Kittenberger" <axel dtone.org> wrote in message
news:9vho6a$1rdg$1 digitaldaemon.com...
 Walter wrote:
 I can never remember the definitions of those grammars. The current D
 parser, however, is pretty simple and is implemented as recursive


 I have little use for LEX/YACC. The code output always seems to require


 little hand editting, and then automating the build process doesn't


 Well the today active GNU projects are called "flex" and "bison". Bison
 requires hand editting? Thats not true in my eyes. First primary bison
 creates only the parser tables, not much more not much less. If you want

 make changes in the parser, you can change the bison.simple template. Once
 for all. There are some limits in bison, as it doesn't allow you to

 how the actions are coded, which is interesting if you are doing non-C
 parsers.

My experience with GNU bison under Win32 is the output generated many warnings from the compiler. The port of it to Win32 was incomplete. I didn't want to rely on a decent port of bison existing on each platform I wanted to port the product to. Even worse, in the end it didn't save any time. Ok, so I'm not remotely a bison expert, and perhaps these are non-issues to someone who has taken the time to thoroughly learn it.
 However I agree that hand writing a parser can have it's advantages. It's
 only too complicated for a small brain like mine :o) I personally never

 unary operators right this way, the miny languages I once wrote the parser
 by hand quickly came out of my control, while bison parsers seems to be
 pretty easy you got it, however sometimes configuring out how to resolve
 some shift/reduce conflicuts can be as tedious.

A hand-tuned parser can be very fast as well <g>.
 Lexers are so simple anyway I can't see a reason to use LEX.

there are all two different tools for spelling and for grammar, after all it is in principle the same or? Only once your tokens are the alphabet and once your tokens are the words outputed by the lexer grammar.

In principle they are the same, in practice not. You're not really after a data graph being built by the lexer, just a token stream. Whereas with a parser, a syntax graph is the desired result. Also, despite what compiler textbooks imply, lexing and parsing are only minor parts of a compiler. The real work is in the semantic analysis, optimization, and code generation. For example, the lexer in D is 1400 lines and the parser is 2600. Of course, the goal was to make them simple <g>.
Dec 16 2001
parent Axel Kittenberger <axel dtone.org> writes:
Walter wrote:

 My experience with GNU bison under Win32 is the output generated many
 warnings from the compiler. The port of it to Win32 was incomplete. I
 didn't want to rely on a decent port of bison existing on each platform I
 wanted to port the product to. Even worse, in the end it didn't save any
 time.
 
 Ok, so I'm not remotely a bison expert, and perhaps these are non-issues
 to someone who has taken the time to thoroughly learn it.

I had no problems whatever compiling and running bison with MSVC++ (back at the times I still have used it) - Axel
Dec 16 2001
prev sibling parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
"Sean L. Palmer" wrote:

 If it's at all possible, I'd recommend making D have an LL(k) grammar for
 some small k, as this will vastly simplify the parser needed and allow alot
 more tools to manipulate D code.  Please don't force people to use LEX/YACC
 to parse D.  My script languages have always been LL(2) or LL(1) and I can
 get away with coding the parser by hand.  I think Antlr generates LL(k)
 parsers too.

I don't think that LL(k) is possible (without hacks). The problem is the following: <identifier> ****...*** <identifier> Note that there could be any number of *'s. At this point, a parser CANNOT determine how to parse it. Is it the beginning of a declaration, or it is an expression? If it is a type, then it should be parsed as: (((((...((<identifier> *) *) ... *) *) *) *) *) <identifier> but if it is an expression, then the first * is a multiplication, and the rest are dereferencing operators: ( <identifier> * (* (* (* ... (* (* <identifier> ))...)))) Obviously, it is legal for that string of *'s to be arbitrarily long. Thus, in order to write an LL(k) parser, you have to parse a rule that matches an arbitrarily long string of *'s, and then use that either in an expression or a declaration. It's conceivable, but ugly. I proposed an alternate declaration syntax (move the *'s to the left) that would eliminate this problem. I think that it might make D LALR(2)... -- The Villagers are Online! villagersonline.com .[ (the fox.(quick,brown)) jumped.over(the dog.lazy) ] .[ (a version.of(English).(precise.more)) is(possible) ] ?[ you want.to(help(develop(it))) ]
Jul 22 2002
prev sibling parent "Walter" <walter digitalmars.com> writes:
Michael Gaskins wrote in message <9li9i4$o6j$1 digitaldaemon.com>...
The language specifications look very good to me (I do mostly Java coding
right now but I've also done a fair amount of C and just a tad of C++).
Lets get a compiler for this thing (alpha level, beta level, whatever) to
starve off all the 'vaporware' shouters that will surely come.

The compiler does exist, but it is too embarassingly buggy to post right now <g>.
Aug 18 2001