digitalmars.D - Looking for champion

digitalmars.D - Looking for champion - std.lang.d.lex

Walter Bright (15/15) Oct 21 2010 As we all know, tool support is important for D's success. Making tools ...

Ellery Newcomer (3/3) Oct 21 2010 and how about

Jonathan M Davis (13/18) Oct 21 2010 That would seem like a good idea (though part of me cringes at the idea ...

Don (6/26) Oct 22 2010 In the long term, the requirements for CTFE will be pretty much:

Jonathan M Davis (10/32) Oct 21 2010 You mean that you're going to make someone actually pull out their compi...

bearophile (6/8) Oct 21 2010 You may open the project here:

Russel Winder (14/22) Oct 21 2010 Of course using BitBucket or Launchpad may well be more likely to get
Jonathan M Davis (9/20) Oct 21 2010 I've never actually used Mercurial or Bazaar. I do use git all the time ...

Walter Bright (4/8) Oct 21 2010 Not really, you can just use the dmd lexer source as a guide. Should be

Jonathan M Davis (17/19) Oct 21 2010 Does this mean that you want a pseudo-port of the C++ front end's lexer ...

Walter Bright (10/29) Oct 21 2010 Yes, but not a straight port. The C++ version has things in it that are

Jonathan M Davis (6/43) Oct 22 2010 Okay. Good to know. I'll start looking at the C++ front end some time in...

Lutger (4/10) Oct 22 2010 If you are gonna port from the C++ front end, there is already a port

dolive (2/38) Oct 22 2010 dmd2.050 October will release it ? thank's

dolive (2/22) Oct 22 2010 Do you have Scintilla for D ?

dolive (2/27) Oct 22 2010 Should be port Scintilla to D.
dolive (2/27) Oct 22 2010 Should be port Scintilla to D.

BLS (6/25) Oct 22 2010 Why not creating a DLL/so based Lexer/Parser based on the existing DMD

Walter Bright (2/6) Oct 22 2010 I've done things like that before, they're even more work.
Jacob Carlborg (8/38) Oct 22 2010 I think it would be better to create a lexer/parser in D and have it in

Nick Sabalausky (3/42) Oct 22 2010 *cough* DDMD

Jacob Carlborg (6/50) Oct 23 2010 I know, I would more than love to see DDMD becoming the official D

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (20/39) Oct 22 2010 s =

Walter Bright (3/11) Oct 22 2010 Lexers are so simple, it is less work to just build them by hand than us...

Andrei Alexandrescu (4/15) Oct 22 2010 I wrote a C++ lexer. It wasn't at all easy except if I compared it

Andrei Alexandrescu (21/55) Oct 22 2010 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer

Nick Sabalausky (15/38) Oct 22 2010 FWIW, I've been converting my Goldie lexing/parsing library/toolset (
Sean Kelly (2/19) Oct 22 2010 What about, say, floating-point literals? It seems like the first eleme...

Andrei Alexandrescu (11/30) Oct 22 2010 Yah, with regard to such regular patterns (strings, comments, numbers,

Sean Kelly (5/43) Oct 23 2010 For the second, that may push the work of recognizing some lexical

Sean Kelly (3/50) Oct 23 2010 Or maybe not. A /* could be CommentBegin. I'll have to think on it a bit

Sean Kelly (4/58) Oct 23 2010 I still think it won't work. The stuff inside the comment would come

Andrei Alexandrescu (19/62) Oct 23 2010 I was thinking comments could be easily caught by simple routines:

Walter Bright (12/15) Oct 23 2010 I agree, a set of "canned" and heavily optimized lexing functions for co...

Andrei Alexandrescu (7/23) Oct 23 2010 I don't see these two in tension. "General" does not need entail

Walter Bright (6/11) Oct 23 2010 In general I agree with you, but that is a major project to do that and ...

Sean Kelly (3/73) Oct 23 2010 Ah so the only issue is identifying the first set for a lexical element,
Nick Sabalausky (3/65) Oct 23 2010 What's wrong with regexes? That's pretty typical for lexers.

Andrei Alexandrescu (6/9) Oct 23 2010 I mentioned that using regexes is possible but would make it much more

Nick Sabalausky (20/29) Oct 23 2010 I see. Maybe a lexer 2.0 thing.

Walter Bright (2/3) Oct 23 2010 They don't handle recursion.

Nick Sabalausky (19/22) Oct 23 2010 Neither do plain-old strings. But regexes will get you farther than plai...

Nick Sabalausky (4/27) Oct 23 2010 And FWIW, I was already thnking about making some improvements to Goldie...

Nick Sabalausky (5/36) Oct 23 2010 But that's all if you want generalized lexing or parsing though. If you ...

bearophile (4/7) Oct 23 2010 Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s...

Nick Sabalausky (7/15) Oct 23 2010 I'd certainly hope so. If it isn't, then that would probably mean DMD's ...

Denis Koroskin (3/24) Oct 24 2010 Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must...

Nick Sabalausky (5/33) Oct 24 2010 According to a random file I picked out of trunk, it's dual-licensed wit...

Nick Sabalausky (4/40) Oct 24 2010 That does surprise me though, since I'm pretty sure Phobos is Boost Lice...

Walter Bright (4/6) Oct 24 2010 Phobos is Boost licensed to enable maximum usage for any purpose.

Jacob Carlborg (8/26) Oct 24 2010 As Walter wrote in the first post of this thread: "generally follow

Walter Bright (3/5) Oct 23 2010 The problem is I never have used parser/lexer generators, so I am not re...

Nick Sabalausky (36/41) Oct 23 2010 Understandable.

Walter Bright (3/5) Oct 24 2010 It looks nice, but in clicking around on FAQ, documentation, getting sta...

Nick Sabalausky (36/41) Oct 24 2010 Well, that's because that program (GOLD Parser Builder) is just a tool t...

Walter Bright (2/2) Oct 24 2010 It looks like a solid engine, and a nice tool. Does it belong as part of...
Walter Bright (3/4) Oct 24 2010 One question I have is how does it compare with Spirit? That would be it...

div0 (6/10) Oct 24 2010 Spirit is a LL parser, so it's not really suitable for human edited
Nick Sabalausky (18/22) Oct 24 2010 Can't say I'm really familiar with Spirit. From a brief lookover, these ...

Walter Bright (4/25) Oct 24 2010 Does Goldie have (like Spirit) a set of canned routines for things like ...

Nick Sabalausky (18/44) Oct 24 2010 No, but such things can easily be provided in the docs for simple

Walter Bright (11/39) Oct 24 2010 In the regexp code, I provided special regexes for email addresses and U...

Nick Sabalausky (22/33) Oct 24 2010 I'm not sure what exectly you're suggesting in these two paragraphs? (Or...

Walter Bright (8/48) Oct 25 2010 Are all tokens returned as strings?

Nick Sabalausky (34/42) Oct 25 2010 Goldie's lexer (and parser) are based on the GOLD system (

Walter Bright (11/60) Oct 25 2010 Consider a string literal, say "abc\"def". With Goldie's method, I infer...

Nick Sabalausky (60/75) Oct 25 2010 Yea, that is true. With that string in the input, the value given to the...

Walter Bright (8/85) Oct 25 2010 Probably that's why I don't use lexer generators. Building lexers is the...

Nick Sabalausky (30/36) Oct 26 2010 I've taken a deeper look at Spirit's docs:

bearophile (4/5) Oct 26 2010 I have not used Spirit, but from what I have read, it doesn't scale (the...

Nick Sabalausky (17/22) Oct 26 2010 I think that's just because it's C++ though. I'd bet a D lib that worked...
Leandro Lucarella (11/18) Oct 26 2010 I can confirm that, at least for Spirit 1, and for simple things it

dennis luehring (8/19) Oct 26 2010 yupp - Spirit feels right on the integration-side, but becomes more and

dennis luehring (4/25) Oct 26 2010 that combined with compiletime-features something like the bsn-parse do

Nick Sabalausky (41/54) Oct 26 2010 Goldie (and any GOLD-based system, really) should scale up pretty well. ...

Jacob Carlborg (6/25) Oct 26 2010 I don't have much knowledge in this area but isn't this what a

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= (31/58) Oct 22 2010 ,
Bruno Medeiros (10/49) Nov 19 2010 Agreed, of all the things desired for D, a D tokenizer would rank pretty...

Jonathan M Davis (16/66) Nov 19 2010 We want to make it easy for tools to be built to work on and deal with D...

Bruno Medeiros (9/68) Nov 19 2010 And by providing a lexer and a parser outside the standard library,

Jonathan M Davis (20/90) Nov 19 2010 A,

Bruno Medeiros (25/42) Nov 19 2010 Eh? That license argument doesn't make sense: if the lexer and parser

Todd VanderVeen (7/10) Nov 19 2010 I agree. I do like the suggestion for developing the D grammar in Antlr ...

Bruno Medeiros (5/15) Nov 19 2010 See the comment I made below, to Michael Stover. (

Jonathan M Davis (50/78) Nov 19 2010 It's very different to have D implementation of something - which is bas...

Bruno Medeiros (13/62) Nov 24 2010 There are some misunderstandings here. First, the DMD front-end is

Andrei Alexandrescu (3/52) Nov 19 2010 Even C has strtok.

Bruno Medeiros (6/61) Nov 24 2010 That's just a fancy splitter, I wouldn't call that a proper tokenizer. I...

Bruno Medeiros (4/66) Nov 24 2010 In other words, a lexer, that might be a better term in this context.

bearophile (8/14) Oct 23 2010 This is a quite long talk by Steve Yegge that I've just seen (linked fro...

bearophile (2/4) Oct 23 2010 Sorry, the Reddit thread:
Nick Sabalausky (4/35) Oct 23 2010 I haven't looked at the video, but that sounds like the direction I've h...
Bruno Medeiros (31/45) Nov 24 2010 Hum, very interesting topic! A few disjoint comments:

Andrew Wiley (5/48) Nov 24 2010 be used in this way. The Eclipse plugin for Scala (and I assume the Netb...

Bruno Medeiros (8/52) Nov 25 2010 Interesting, very wise of them to do that.

Nick Sabalausky (9/10) Oct 26 2010 I'm curious, is your reason for this purely to avoid allocations during

Walter Bright (10/19) Oct 26 2010 It's one big giant reason. Storage allocation gets unbelievably costly i...

bearophile (5/7) Oct 26 2010 Java was designed to be simple! Simple means to have a more uniform sema...

Walter Bright (15/29) Oct 26 2010 So was Pascal. See the thread about how useless it was as a result.

retard (9/23) Oct 27 2010 Blablabla.. this nostalgic lesson reminded me, have you even started

Walter Bright (4/11) Oct 27 2010 If that were true, why are Java char/int/double types value types, not a...

bearophile (12/22) Oct 27 2010 purposes, different behaviors, etc.

Walter Bright (3/7) Oct 27 2010 So, there is "value" in value types after all. I confess I have no idea ...

bearophile (5/7) Oct 27 2010 I am not arguing against them in absolute. They are good in some situati...

Bruno Medeiros (22/28) Nov 19 2010 I've been hearing that a lot, but I find this to be excessively
Bruno Medeiros (4/9) Nov 19 2010 There's good simple, and there's bad simple...

Nick Sabalausky (67/78) Oct 26 2010 Honestly, I'm not entirely certain whether or not Goldie actually needs ...

Walter Bright (7/9) Oct 26 2010 I use a tagged variant for the token struct.

retard (3/15) Oct 27 2010 This is why the basic data structure in functional languages, algebraic

Walter Bright (3/5) Oct 27 2010 I think you recently demonstrated otherwise, as proven by the widespread...

retard (4/10) Oct 27 2010 I don't understand your logic -- Widespread use of Java proves that

Walter Bright (7/18) Oct 27 2010 You told me that widespread use of Java proved that nothing more complex...

retard (9/31) Oct 27 2010 I only meant that the widespead adoption of Java shows how the public at...

Walter Bright (6/14) Oct 27 2010 Choice of a language has numerous factors, so you cannot dismiss one fac...

retard (10/28) Oct 27 2010 I don't think I said anything that contradicts that.

Nick Sabalausky (7/9) Oct 27 2010 The public at large is convinced that "Java is fast now, really!". So I'...

Todd D. VanderVeen (15/17) Oct 27 2010 Legacy in the sense that C is perhaps.

retard (14/17) Oct 27 2010 Probably the top 10 names are more or less correct there, but some funny...

Don (6/28) Oct 28 2010 I reckon Fortran is the one to look at it. If Tiobe's stats were

Matthias Pleh (18/46) Oct 28 2010 There was an article in the Ct-Magazin (German) where they took a closer...

Bruno Medeiros (16/26) Nov 19 2010 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...

bearophile (4/7) Nov 19 2010 Java on Adroid is not going well, there is a Oracle->Google lawsuit in p...

Andrew Wiley (7/13) Nov 19 2010 I have to agree with Bruno here, Java isn't going anywhere soon. It has ...

Nick Sabalausky (16/34) Nov 23 2010 To be clear, I meant Java the language, not Java the VM. But yea, you're...

Michael Stover (15/47) Nov 19 2010 As for D lexers and tokenizers, what would be nice is to

Bruno Medeiros (11/24) Nov 19 2010 Yes, that would be much better. It would be directly and immediately

Michael Stover (4/32) Nov 19 2010 so that was 4 months ago - how do things currently stand on that initiat...

Matthias Pleh (5/34) Nov 20 2010 There is a project with an antlr D-grammar in work.
Bruno Medeiros (15/44) Nov 24 2010 I don't know about Ellery, as you can see in that thread he/she(?)

Ellery Newcomer (12/24) Nov 24 2010 Normally I go by 'it'.

Bruno Medeiros (12/42) Nov 24 2010 I didn't meant to offend or anything, I was just unsure of that. To me

Ellery Newcomer (4/9) Nov 24 2010 None taken; I'm just laughing at you. As I understand it, though,
bearophile (4/6) Nov 24 2010 In Python newsgroups I have seen few women, now and then, but in the D n...

Daniel Gibson (5/14) Nov 24 2010 At my university there are *very* few woman studying computer science.

Nick Sabalausky (4/23) Nov 25 2010 See, that's the #1 worst thing about the field of programming: Total sau...
Bruno Medeiros (12/31) Nov 26 2010 It is well know that there is a big gender gap in CS with regards to

Bruno Medeiros (11/42) Nov 19 2010 "the widespead adoption of Java shows how the public at large cares very...

dolive (2/22) Feb 26 2011 intense support! Someone to do it?

Jonathan M Davis (7/33) Feb 26 2011 ed

dolive (3/35) Feb 26 2011 thanks, make an all out effort !

Walter Bright <newshound2 digitalmars.com> writes:

As we all know, tool support is important for D's success. Making tools easier 
to build will help with that.

To that end, I think we need a lexer for the standard library - std.lang.d.lex. 
It would be helpful in writing color syntax highlighting filters, pretty 
printers, repl, doc generators, static analyzers, and even D compilers.

It should:

1. support a range interface for its input, and a range interface for its output
2. optionally not generate lexical errors, but just try to recover and continue
3. optionally return comments and ddoc comments as tokens
4. the tokens should be a value type, not a reference type
5. generally follow along with the C++ one so that they can be maintained in
tandem

It can also serve as the basis for creating a javascript implementation that
can 
be embedded into web pages for syntax highlighting, and eventually an 
std.lang.d.parse.

Anyone want to own this?

Oct 21 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

and how about

6. ctfe compatible

?

Oct 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday 21 October 2010 15:12:41 Ellery Newcomer wrote:
 and how about
 
 6. ctfe compatible
 
 ?

That would seem like a good idea (though part of me cringes at the idea of a 
program specifically running the lexer (and possibly the parser) as part of its 
own compilation process), but for the main purpose of being used for tools for 
D, that would seem completely unnecessary. So, I'd say that it would be a good 
idea to make it CTFE-able if it is at all reasonable to do so but that if
making 
it CTFE-able would harm the design for more typical use, then it shouldn't be 
made CTFE-able. Personally, I don't have a good feel for exactly what is CTFE-
able though, so I have no idea how easy it would be to make it CTFE-able. 
However, it does seem like a good idea if it's reasonable to do so. And if it's 
not, hopefully as dmd's CTFE capabilities become more advanced, it will become 
possible to do so.

- Jonathan M Davis

Oct 21 2010

Don <nospam nospam.com> writes:

Jonathan M Davis wrote:
 On Thursday 21 October 2010 15:12:41 Ellery Newcomer wrote:
 and how about

 6. ctfe compatible

 ?

 
 That would seem like a good idea (though part of me cringes at the idea of a 
 program specifically running the lexer (and possibly the parser) as part of
its 
 own compilation process), but for the main purpose of being used for tools for 
 D, that would seem completely unnecessary. So, I'd say that it would be a good 
 idea to make it CTFE-able if it is at all reasonable to do so but that if
making 
 it CTFE-able would harm the design for more typical use, then it shouldn't be 
 made CTFE-able. Personally, I don't have a good feel for exactly what is CTFE-
 able though, so I have no idea how easy it would be to make it CTFE-able. 
 However, it does seem like a good idea if it's reasonable to do so. And if
it's 
 not, hopefully as dmd's CTFE capabilities become more advanced, it will become 
 possible to do so.
 
 - Jonathan M Davis

In the long term, the requirements for CTFE will be pretty much:
1. the function must be  safe (eg, no asm).
2. the function must be pure
3. the compiler must have access to the source code

You'll probably satisfy all those requirements anyway.

Oct 22 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, October 21, 2010 15:01:21 Walter Bright wrote:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and even
 D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
 output 2. optionally not generate lexical errors, but just try to recover
 and continue 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained
 in tandem
 
 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.
 
 Anyone want to own this?

You mean that you're going to make someone actually pull out their compiler 
book? ;)

I'd love to do this (lexers and parsers are great fun IMHO - it's the code 
generation that isn't so fun), but I'm afraid that I'm busy enough at the
moment 
that if I take it on, it won't get done very quickly. It is so very tempting 
though...

So, as long as you're not in a hurry, I'm up for it, but I can't guarantee 
anything even approaching fast delivery.

- Jonathan M Davis

Oct 21 2010

bearophile <bearophileHUGS lycos.com> writes:

Jonathan M Davis:

 So, as long as you're not in a hurry, I'm up for it, but I can't guarantee 
 anything even approaching fast delivery.

You may open the project here:
http://github.com/
And then other people may help you along the way.

Bye,
bearophile

Oct 21 2010

Russel Winder <russel russel.org.uk> writes:

On Thu, 2010-10-21 at 19:51 -0400, bearophile wrote:
 Jonathan M Davis:
=20
 So, as long as you're not in a hurry, I'm up for it, but I can't guaran=


tee=20
 anything even approaching fast delivery.

=20
 You may open the project here:
 http://github.com/
 And then other people may help you along the way.

Of course using BitBucket or Launchpad may well be more likely to get
support as Mercurial and Bazaar are so much more usable that Git.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel russel.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Oct 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday, October 21, 2010 17:24:34 Russel Winder wrote:
 On Thu, 2010-10-21 at 19:51 -0400, bearophile wrote:
 Jonathan M Davis:
 So, as long as you're not in a hurry, I'm up for it, but I can't
 guarantee anything even approaching fast delivery.

 
 You may open the project here:
 http://github.com/
 And then other people may help you along the way.

 
 Of course using BitBucket or Launchpad may well be more likely to get
 support as Mercurial and Bazaar are so much more usable that Git.

I've never actually used Mercurial or Bazaar. I do use git all the time though. 
I quite like it. Now, it could be Mercurial or Bazaar is better (like I said, I 
haven't used them), but I do find git to be quite useable. The simple fact that
I 
can just create a repository in place instead of having to set up a separate 
location for a repository (like you have to do with svn) is a _huge_ 
improvement. I didn't really use source control on my personal projects before 
git. git actually makes it easy enough to do so that I do it all the time now.

- Jonathan M Davis

Oct 21 2010

Walter Bright <newshound2 digitalmars.com> writes:

Jonathan M Davis wrote:
 You mean that you're going to make someone actually pull out their compiler 
 book? ;)

Not really, you can just use the dmd lexer source as a guide. Should be 
straightforward.

 So, as long as you're not in a hurry, I'm up for it, but I can't guarantee 
 anything even approaching fast delivery.

As long as it gets done!

Oct 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday 21 October 2010 15:01:21 Walter Bright wrote:
 5. generally follow along with the C++ one so that they can be maintained
 in tandem

Does this mean that you want a pseudo-port of the C++ front end's lexer to D
for 
this? Or are you looking for just certain pieces of it to be similar?

I haven't looked at the front end code yet, so I don't know how it works there, 
but I wouldn't expect it to uses ranges, for instance, so I would expect that 
the basic design would naturally stray a bit from whatever was done in C++ 
simply by doing things in fairly idiomatic D. And if I do look at the front end 
to see how that's done, there's the issue of the license. As I understand it, 
the front end is LGPL, and Phobos is generally Boost, which would mean that I 
would be looking at LGPL-licensed code when designing Boost-licensed, even 
though it wouldn't really be copying the code per se since it's a change of 
language (though if you did the whole front end, obviously the license issue
can 
be waved quite easily).

License issues aside, however, I do think that it would make sense for 
std.lang.d.lex to do things similiarly to the C++ front end, even if there are
a 
number of basic differences.

- Jonathan M Davis

Oct 21 2010

Walter Bright <newshound2 digitalmars.com> writes:

Jonathan M Davis wrote:
 On Thursday 21 October 2010 15:01:21 Walter Bright wrote:
 5. generally follow along with the C++ one so that they can be maintained
 in tandem

 
 Does this mean that you want a pseudo-port of the C++ front end's lexer to D
for 
 this? Or are you looking for just certain pieces of it to be similar?

Yes, but not a straight port. The C++ version has things in it that are 
unnecessary for the D version, like the external string table (should use an 
associative array instead), the support for lookahead can be put in the parser, 
doesn't tokenize comments, etc.

Essentially I'd like the D lexer to be self-contained in one file.

 I haven't looked at the front end code yet, so I don't know how it works
there, 
 but I wouldn't expect it to uses ranges, for instance, so I would expect that 
 the basic design would naturally stray a bit from whatever was done in C++ 
 simply by doing things in fairly idiomatic D. And if I do look at the front
end 
 to see how that's done, there's the issue of the license. As I understand it, 
 the front end is LGPL, and Phobos is generally Boost, which would mean that I 
 would be looking at LGPL-licensed code when designing Boost-licensed, even 
 though it wouldn't really be copying the code per se since it's a change of 
 language (though if you did the whole front end, obviously the license issue
can 
 be waved quite easily).

Since the license is mine, I can change the D version to the Boost license, no 
problem.


 License issues aside, however, I do think that it would make sense for 
 std.lang.d.lex to do things similiarly to the C++ front end, even if there are
a 
 number of basic differences.

Yup. The idea is the D version lexes exactly the same grammar as the dmd one. 
The easiest way to ensure that is to do equivalent logic.

Oct 21 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Thursday 21 October 2010 23:55:42 Walter Bright wrote:
 Jonathan M Davis wrote:
 On Thursday 21 October 2010 15:01:21 Walter Bright wrote:
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 
 Does this mean that you want a pseudo-port of the C++ front end's lexer
 to D for this? Or are you looking for just certain pieces of it to be
 similar?

 
 Yes, but not a straight port. The C++ version has things in it that are
 unnecessary for the D version, like the external string table (should use
 an associative array instead), the support for lookahead can be put in the
 parser, doesn't tokenize comments, etc.
 
 Essentially I'd like the D lexer to be self-contained in one file.
 
 I haven't looked at the front end code yet, so I don't know how it works
 there, but I wouldn't expect it to uses ranges, for instance, so I would
 expect that the basic design would naturally stray a bit from whatever
 was done in C++ simply by doing things in fairly idiomatic D. And if I
 do look at the front end to see how that's done, there's the issue of
 the license. As I understand it, the front end is LGPL, and Phobos is
 generally Boost, which would mean that I would be looking at
 LGPL-licensed code when designing Boost-licensed, even though it
 wouldn't really be copying the code per se since it's a change of
 language (though if you did the whole front end, obviously the license
 issue can be waved quite easily).

 
 Since the license is mine, I can change the D version to the Boost license,
 no problem.
 
 License issues aside, however, I do think that it would make sense for
 std.lang.d.lex to do things similiarly to the C++ front end, even if
 there are a number of basic differences.

 
 Yup. The idea is the D version lexes exactly the same grammar as the dmd
 one. The easiest way to ensure that is to do equivalent logic.

Okay. Good to know. I'll start looking at the C++ front end some time in the 
next few days, but like I said, I really don't know how much time I'm going to 
be able to spend on it, so it won't necessarily be quick. However, porting
logic 
should be much faster than doing it from scratch.

- Jonathan M Davis

Oct 22 2010

Lutger <lutger.blijdestijn gmail.com> writes:

Jonathan M Davis wrote:

...
 Okay. Good to know. I'll start looking at the C++ front end some time in
 the next few days, but like I said, I really don't know how much time I'm
 going to be able to spend on it, so it won't necessarily be quick.
 However, porting logic should be much faster than doing it from scratch.
 
 - Jonathan M Davis

If you are gonna port from the C++ front end, there is already a port 
called ddmd which may give you a head start: www.dsource.org/projects/ddmd

Oct 22 2010

dolive <dolive89 sina.com> writes:

Walter Bright д��:

 Jonathan M Davis wrote:
 On Thursday 21 October 2010 15:01:21 Walter Bright wrote:
 5. generally follow along with the C++ one so that they can be maintained
 in tandem

 
 Does this mean that you want a pseudo-port of the C++ front end's lexer to D
for 
 this? Or are you looking for just certain pieces of it to be similar?

 
 Yes, but not a straight port. The C++ version has things in it that are 
 unnecessary for the D version, like the external string table (should use an 
 associative array instead), the support for lookahead can be put in the
parser, 
 doesn't tokenize comments, etc.
 
 Essentially I'd like the D lexer to be self-contained in one file.
 
 I haven't looked at the front end code yet, so I don't know how it works
there, 
 but I wouldn't expect it to uses ranges, for instance, so I would expect that 
 the basic design would naturally stray a bit from whatever was done in C++ 
 simply by doing things in fairly idiomatic D. And if I do look at the front
end 
 to see how that's done, there's the issue of the license. As I understand it, 
 the front end is LGPL, and Phobos is generally Boost, which would mean that I 
 would be looking at LGPL-licensed code when designing Boost-licensed, even 
 though it wouldn't really be copying the code per se since it's a change of 
 language (though if you did the whole front end, obviously the license issue
can 
 be waved quite easily).

 
 Since the license is mine, I can change the D version to the Boost license, no 
 problem.
 
 
 License issues aside, however, I do think that it would make sense for 
 std.lang.d.lex to do things similiarly to the C++ front end, even if there are
a 
 number of basic differences.

 
 Yup. The idea is the D version lexes exactly the same grammar as the dmd one. 
 The easiest way to ensure that is to do equivalent logic.

dmd2.050 October will release it ? thank's

Oct 22 2010

dolive <dolive89 sina.com> writes:

Walter Bright д��:

 As we all know, tool support is important for D's success. Making tools easier 
 to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
std.lang.d.lex. 
 It would be helpful in writing color syntax highlighting filters, pretty 
 printers, repl, doc generators, static analyzers, and even D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
output
 2. optionally not generate lexical errors, but just try to recover and continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained in
tandem
 
 It can also serve as the basis for creating a javascript implementation that
can 
 be embedded into web pages for syntax highlighting, and eventually an 
 std.lang.d.parse.
 
 Anyone want to own this?

Do you have Scintilla for D ?

Oct 22 2010

dolive <dolive89 sina.com> writes:

dolive д��:

 Walter Bright д��:
 
 As we all know, tool support is important for D's success. Making tools easier 
 to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
std.lang.d.lex. 
 It would be helpful in writing color syntax highlighting filters, pretty 
 printers, repl, doc generators, static analyzers, and even D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
output
 2. optionally not generate lexical errors, but just try to recover and continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained in
tandem
 
 It can also serve as the basis for creating a javascript implementation that
can 
 be embedded into web pages for syntax highlighting, and eventually an 
 std.lang.d.parse.
 
 Anyone want to own this?

 
 Do you have Scintilla for D ?
 

Should be port Scintilla to D.

Oct 22 2010

dolive <dolive89 sina.com> writes:

dolive д��:

 Walter Bright д��:
 
 As we all know, tool support is important for D's success. Making tools easier 
 to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
std.lang.d.lex. 
 It would be helpful in writing color syntax highlighting filters, pretty 
 printers, repl, doc generators, static analyzers, and even D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
output
 2. optionally not generate lexical errors, but just try to recover and continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained in
tandem
 
 It can also serve as the basis for creating a javascript implementation that
can 
 be embedded into web pages for syntax highlighting, and eventually an 
 std.lang.d.parse.
 
 Anyone want to own this?

 
 Do you have Scintilla for D ?
 

Should be port Scintilla to D.

Oct 22 2010

BLS <windevguy hotmail.de> writes:

Why not creating a DLL/so based Lexer/Parser based on the existing DMD 
front end.? It could be always up to date. Necessary Steps. functional 
wrappers around C++ classes, Implementing the visitor pattern (AST), 
create std.lex and std.parse..

my 2 cents

On 22/10/2010 00:01, Walter Bright wrote:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.

 Anyone want to own this?

Oct 22 2010

Walter Bright <newshound2 digitalmars.com> writes:

BLS wrote:
 Why not creating a DLL/so based Lexer/Parser based on the existing DMD 
 front end.? It could be always up to date. Necessary Steps. functional 
 wrappers around C++ classes, Implementing the visitor pattern (AST), 
 create std.lex and std.parse..

I've done things like that before, they're even more work.

Oct 22 2010

Jacob Carlborg <doob me.com> writes:

On 2010-10-22 17:37, BLS wrote:
 Why not creating a DLL/so based Lexer/Parser based on the existing DMD
 front end.? It could be always up to date. Necessary Steps. functional
 wrappers around C++ classes, Implementing the visitor pattern (AST),
 create std.lex and std.parse..

 my 2 cents

 On 22/10/2010 00:01, Walter Bright wrote:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.

 Anyone want to own this?


I think it would be better to create a lexer/parser in D and have it in 
the standard library. Then one could begin the process of porting the 
DMD frontend using this library. Then hopefully the DMD frontend will be 
written in D and use this new library, being one code base and will 
always be up to date.

-- 
/Jacob Carlborg

Oct 22 2010

"Nick Sabalausky" <a a.a> writes:

"Jacob Carlborg" <doob me.com> wrote in message 
news:i9spln$lbj$1 digitalmars.com...
 On 2010-10-22 17:37, BLS wrote:
 Why not creating a DLL/so based Lexer/Parser based on the existing DMD
 front end.? It could be always up to date. Necessary Steps. functional
 wrappers around C++ classes, Implementing the visitor pattern (AST),
 create std.lex and std.parse..

 my 2 cents

 On 22/10/2010 00:01, Walter Bright wrote:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.

 Anyone want to own this?


 I think it would be better to create a lexer/parser in D and have it in 
 the standard library. Then one could begin the process of porting the DMD 
 frontend using this library. Then hopefully the DMD frontend will be 
 written in D and use this new library, being one code base and will always 
 be up to date.

*cough* DDMD

Oct 22 2010

Jacob Carlborg <doob me.com> writes:

On 2010-10-22 22:42, Nick Sabalausky wrote:
 "Jacob Carlborg"<doob me.com>  wrote in message
 news:i9spln$lbj$1 digitalmars.com...
 On 2010-10-22 17:37, BLS wrote:
 Why not creating a DLL/so based Lexer/Parser based on the existing DMD
 front end.? It could be always up to date. Necessary Steps. functional
 wrappers around C++ classes, Implementing the visitor pattern (AST),
 create std.lex and std.parse..

 my 2 cents

 On 22/10/2010 00:01, Walter Bright wrote:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.

 Anyone want to own this?


 I think it would be better to create a lexer/parser in D and have it in
 the standard library. Then one could begin the process of porting the DMD
 frontend using this library. Then hopefully the DMD frontend will be
 written in D and use this new library, being one code base and will always
 be up to date.

 *cough* DDMD

I know, I would more than love to see DDMD becoming the official D 
compiler but if that will happen I would still like that the frontend is 
based on the lexer/parser library in phobos.

-- 
/Jacob Carlborg

Oct 23 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>  =

napisa=B3(a):

 As we all know, tool support is important for D's success. Making tool=

s  =

 easier to build will help with that.

 To that end, I think we need a lexer for the standard library -  =

 std.lang.d.lex. It would be helpful in writing color syntax highlighti=

ng  =

 filters, pretty printers, repl, doc generators, static analyzers, and =

 =

 even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for =

 =

 its output
 2. optionally not generate lexical errors, but just try to recover and=

  =

 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be  =

 maintained in tandem

 It can also serve as the basis for creating a javascript implementatio=

n  =

 that can be embedded into web pages for syntax highlighting, and  =

 eventually an std.lang.d.parse.

 Anyone want to own this?

Interesting idea. Here's another: D will soon need bindings for CORBA,  =

Thrift, etc, so lexers will have to be written all over to grok interfac=
e  =

files. Perhaps a generic tokenizer which can be parametrized with a  =

lexical grammar would bring more ROI, I got a hunch D's templates are  =

strong enough to pull this off without any source code generation ala  =

JavaCC. The books I read on compilers say tokenization is a solved  =

problem, so the theory part on what a good abstraction should be is done=
.  =

What you think?

--
Tomek

Oct 22 2010

Walter Bright <newshound2 digitalmars.com> writes:

Tomek Sowi�ski wrote:
 Interesting idea. Here's another: D will soon need bindings for CORBA, 
 Thrift, etc, so lexers will have to be written all over to grok 
 interface files. Perhaps a generic tokenizer which can be parametrized 
 with a lexical grammar would bring more ROI, I got a hunch D's templates 
 are strong enough to pull this off without any source code generation 
 ala JavaCC. The books I read on compilers say tokenization is a solved 
 problem, so the theory part on what a good abstraction should be is 
 done. What you think?

Lexers are so simple, it is less work to just build them by hand than use lexer 
generator tools.

Oct 22 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/22/10 14:17 CDT, Walter Bright wrote:
 Tomek Sowiński wrote:
 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's
 templates are strong enough to pull this off without any source code
 generation ala JavaCC. The books I read on compilers say tokenization
 is a solved problem, so the theory part on what a good abstraction
 should be is done. What you think?

 Lexers are so simple, it is less work to just build them by hand than
 use lexer generator tools.

I wrote a C++ lexer. It wasn't at all easy except if I compared it 
against the work necessary to build a full compiler.

Andrei

Oct 22 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisał(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer 
generator.

I have in mind the entire implementation of a simple design, but never 
had the time to execute on it. The tokenizer would work like this:

alias Lexer!(
     "+", "PLUS",
     "-", "MINUS",
     "+=", "PLUS_EQ",
     ...
     "if", "IF",
     "else", "ELSE"
     ...
) DLexer;

Such a declaration generates numeric values DLexer.PLUS etc. and 
generates an efficient code that extracts a stream of tokens from a 
stream of text. Each token in the token stream has the ID and the text.

Comments, strings etc. can be handled in one of several ways but that's 
a longer discussion.

The undertaking is doable but nontrivial.


Andrei

Oct 22 2010

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:i9spsa$ll0$1 digitalmars.com...
 On 10/22/10 14:02 CDT, Tomek Sowinski wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisal(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer 
 generator.

FWIW, I've been converting my Goldie lexing/parsing library/toolset ( 
http://www.dsource.org/projects/goldie ) to D2/Phobos, and that should have 
a release sometime in the next couple months or so.

I'm not sure it would really be appropriate for Phobos since it's pretty 

range-ified yet, probably doesn't use Phobos coding conventions, and relies 
on one of my other libraries/tools.

But it does do generalized lexing/parsing (LALR) via the GOLD ( 
http://www.devincook.com/goldparser/ ) grammar file formats, can optionally 
generate source files for better compile-time checking (for instance, so 
Token!"<Statemnt>" will generate a compile-time error), has full 
documentation, and I'm working on a tool/lib that will compile the grammars 
without having to use the Windows/GUI-based GOLD Parser Builder tool.

Oct 22 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu Wrote:
 
 I have in mind the entire implementation of a simple design, but never 
 had the time to execute on it. The tokenizer would work like this:
 
 alias Lexer!(
      "+", "PLUS",
      "-", "MINUS",
      "+=", "PLUS_EQ",
      ...
      "if", "IF",
      "else", "ELSE"
      ...
 ) DLexer;
 
 Such a declaration generates numeric values DLexer.PLUS etc. and 
 generates an efficient code that extracts a stream of tokens from a 
 stream of text. Each token in the token stream has the ID and the text.

What about, say, floating-point literals?  It seems like the first element of a
pair might have to be a regex pattern.

Oct 22 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 I have in mind the entire implementation of a simple design, but never
 had the time to execute on it. The tokenizer would work like this:

 alias Lexer!(
       "+", "PLUS",
       "-", "MINUS",
       "+=", "PLUS_EQ",
       ...
       "if", "IF",
       "else", "ELSE"
       ...
 ) DLexer;

 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from a
 stream of text. Each token in the token stream has the ID and the text.

 What about, say, floating-point literals?  It seems like the first element of
a pair might have to be a regex pattern.


Yah, with regard to such regular patterns (strings, comments, numbers, 
identifiers) there are at least two possibilities that I see:

1. Go the full route of allowing regexen in the definition. This is very 
hard because you need to generate an efficient (N|D)FA during compilation.

2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the 
compile-time table matches, just call onUnrecognizedString(). In 
conjunction with a few simple specialized functions, that makes it very 
simple to define arbitrarily complex lexers where the bulk of the work 
(and the most tedious part) is done by the D compiler.


Andrei

Oct 22 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:
 
 alias Lexer!(
       "+", "PLUS",
       "-", "MINUS",
       "+=", "PLUS_EQ",
       ...
       "if", "IF",
       "else", "ELSE"
       ...
 ) DLexer;
 
 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from a
 stream of text. Each token in the token stream has the ID and the
 text.

 
 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.

 
 
 Yah, with regard to such regular patterns (strings, comments, numbers,
 identifiers) there are at least two possibilities that I see:
 
 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.
 
 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of the
 work (and the most tedious part) is done by the D compiler.

For the second, that may push the work of recognizing some lexical
elements into the parser. For example, a comment may be defined as /**/,
which if there is no lexical definition of a comment means that it
parses as four distinct valid tokens, div mul mul div.

Oct 23 2010

Sean Kelly <sean invisibleduck.org> writes:

Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:
 
 alias Lexer!(
       "+", "PLUS",
       "-", "MINUS",
       "+=", "PLUS_EQ",
       ...
       "if", "IF",
       "else", "ELSE"
       ...
 ) DLexer;
 
 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from a
 stream of text. Each token in the token stream has the ID and the
 text.

 
 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.

 
 
 Yah, with regard to such regular patterns (strings, comments,
 numbers,
 identifiers) there are at least two possibilities that I see:
 
 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.
 
 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of
 the
 work (and the most tedious part) is done by the D compiler.

 
 For the second, that may push the work of recognizing some lexical
 elements into the parser. For example, a comment may be defined as
 /**/,
 which if there is no lexical definition of a comment means that it
 parses as four distinct valid tokens, div mul mul div.

Or maybe not. A /* could be CommentBegin. I'll have to think on it a bit
more.

Oct 23 2010

Sean Kelly <sean invisibleduck.org> writes:

Sean Kelly <sean invisibleduck.org> wrote:
 Sean Kelly <sean invisibleduck.org> wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:
 
 alias Lexer!(
       "+", "PLUS",
       "-", "MINUS",
       "+=", "PLUS_EQ",
       ...
       "if", "IF",
       "else", "ELSE"
       ...
 ) DLexer;
 
 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from
 a
 stream of text. Each token in the token stream has the ID and the
 text.

 
 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.

 
 
 Yah, with regard to such regular patterns (strings, comments,
 numbers,
 identifiers) there are at least two possibilities that I see:
 
 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.
 
 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in
 the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of
 the
 work (and the most tedious part) is done by the D compiler.

 
 For the second, that may push the work of recognizing some lexical
 elements into the parser. For example, a comment may be defined as
 /**/,
 which if there is no lexical definition of a comment means that it
 parses as four distinct valid tokens, div mul mul div.

 
 Or maybe not. A /* could be CommentBegin. I'll have to think on it a
 bit
 more.

I still think it won't work. The stuff inside the comment would come
through as a string of random tokens. Also, the // comment is EOL
sensitive, and this info Ian normally communicated to the parser.

Oct 23 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/23/10 11:44 CDT, Sean Kelly wrote:
 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:

 alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "if", "IF",
        "else", "ELSE"
        ...
 ) DLexer;

 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from a
 stream of text. Each token in the token stream has the ID and the
 text.

 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.


 Yah, with regard to such regular patterns (strings, comments, numbers,
 identifiers) there are at least two possibilities that I see:

 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.

 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of the
 work (and the most tedious part) is done by the D compiler.

 For the second, that may push the work of recognizing some lexical
 elements into the parser. For example, a comment may be defined as /**/,
 which if there is no lexical definition of a comment means that it
 parses as four distinct valid tokens, div mul mul div.

I was thinking comments could be easily caught by simple routines:

alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "/*", q{parseNonNestedComment("*/")},
        "/+", q{parseNestedComment("+/")},
        "//", q{parseOneLineComment()},
        ...
        "if", "IF",
        "else", "ELSE",
        ...
) DLexer;

During compilation, such non-tokens are recognized as code by the lexer 
generator and called appropriately. A comprehensive library of such 
routines completes a useful library.


Andrei

Oct 23 2010

Walter Bright <newshound2 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 During compilation, such non-tokens are recognized as code by the lexer 
 generator and called appropriately. A comprehensive library of such 
 routines completes a useful library.

I agree, a set of "canned" and heavily optimized lexing functions for common 
things like identifiers, numbers, comments, etc., would make a lexing library 
much more practical.

Those will work great for inventing DSLs, but for existing languages, the 
trouble is that the different languages have subtle variations on how they 
handle them. For example, D's numeric literals allow embedded underscores. Go 
doesn't overflow on numeric literals. Javascript has some wacky rules to 
distinguish a comment from a regex. The \uNNNN letters allowed in identifiers
in 
some languages.

So while a general purpose lexing library will be very useful, for lexing D
code 
(and Java, Javascript, etc.) a custom one will probably be much more practical.

Oct 23 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/23/10 13:41 CDT, Walter Bright wrote:
 Andrei Alexandrescu wrote:
 During compilation, such non-tokens are recognized as code by the
 lexer generator and called appropriately. A comprehensive library of
 such routines completes a useful library.

 I agree, a set of "canned" and heavily optimized lexing functions for
 common things like identifiers, numbers, comments, etc., would make a
 lexing library much more practical.

 Those will work great for inventing DSLs, but for existing languages,
 the trouble is that the different languages have subtle variations on
 how they handle them. For example, D's numeric literals allow embedded
 underscores. Go doesn't overflow on numeric literals. Javascript has
 some wacky rules to distinguish a comment from a regex. The \uNNNN
 letters allowed in identifiers in some languages.

 So while a general purpose lexing library will be very useful, for
 lexing D code (and Java, Javascript, etc.) a custom one will probably be
 much more practical.

I don't see these two in tension. "General" does not need entail 
"unsuitable for subtle particularities". It is more difficult, but not 
impossible. Again, a general parser that takes care of the 90% of the 
drudgework and gives enough hooks to do the remaining 10%, all as 
efficient as hand-written code.

Andrei

Oct 23 2010

Walter Bright <newshound2 digitalmars.com> writes:

Andrei Alexandrescu wrote:
 I don't see these two in tension. "General" does not need entail 
 "unsuitable for subtle particularities". It is more difficult, but not 
 impossible. Again, a general parser that takes care of the 90% of the 
 drudgework and gives enough hooks to do the remaining 10%, all as 
 efficient as hand-written code.


In general I agree with you, but that is a major project to do that and make it 
general, efficient, and easy to use - and then, one has to make a D lexer out
of 
it. In the meantime, we have a lexer for D that would be straightforward to 
adapt to be a D library module. The only decisions that have to be made is what 
the API to it will be.

Oct 23 2010

Sean Kelly <sean invisibleduck.org> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 On 10/23/10 11:44 CDT, Sean Kelly wrote:
 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:
 
 alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "if", "IF",
        "else", "ELSE"
        ...
 ) DLexer;
 
 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from
 a
 stream of text. Each token in the token stream has the ID and the
 text.

 
 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.

 
 
 Yah, with regard to such regular patterns (strings, comments,
 numbers,
 identifiers) there are at least two possibilities that I see:
 
 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.
 
 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in
 the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of
 the
 work (and the most tedious part) is done by the D compiler.

 
 For the second, that may push the work of recognizing some lexical
 elements into the parser. For example, a comment may be defined as
 /**/,
 which if there is no lexical definition of a comment means that it
 parses as four distinct valid tokens, div mul mul div.

 
 I was thinking comments could be easily caught by simple routines:
 
 alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "/*", q{parseNonNestedComment("*/")},
        "/+", q{parseNestedComment("+/")},
        "//", q{parseOneLineComment()},
        ...
        "if", "IF",
        "else", "ELSE",
        ...
 ) DLexer;
 
 During compilation, such non-tokens are recognized as code by the
 lexer generator and called appropriately. A comprehensive library of
 such routines completes a useful library.

Ah so the only issue is identifying the first set for a lexical element,
is essence. That works.

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:i9v8vq$2gvh$1 digitalmars.com...
 On 10/23/10 11:44 CDT, Sean Kelly wrote:
 Andrei Alexandrescu<SeeWebsiteForEmail erdani.org>  wrote:
 On 10/22/10 16:28 CDT, Sean Kelly wrote:
 Andrei Alexandrescu Wrote:
 I have in mind the entire implementation of a simple design, but
 never
 had the time to execute on it. The tokenizer would work like this:

 alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "if", "IF",
        "else", "ELSE"
        ...
 ) DLexer;

 Such a declaration generates numeric values DLexer.PLUS etc. and
 generates an efficient code that extracts a stream of tokens from a
 stream of text. Each token in the token stream has the ID and the
 text.

 What about, say, floating-point literals?  It seems like the first
 element of a pair might have to be a regex pattern.


 Yah, with regard to such regular patterns (strings, comments, numbers,
 identifiers) there are at least two possibilities that I see:

 1. Go the full route of allowing regexen in the definition. This is
 very hard because you need to generate an efficient (N|D)FA during
 compilation.

 2. Pragmatically allow "fallthrough" routines, i.e. if nothing in the
 compile-time table matches, just call onUnrecognizedString(). In
 conjunction with a few simple specialized functions, that makes it
 very simple to define arbitrarily complex lexers where the bulk of the
 work (and the most tedious part) is done by the D compiler.

 For the second, that may push the work of recognizing some lexical
 elements into the parser. For example, a comment may be defined as /**/,
 which if there is no lexical definition of a comment means that it
 parses as four distinct valid tokens, div mul mul div.

 I was thinking comments could be easily caught by simple routines:

 alias Lexer!(
        "+", "PLUS",
        "-", "MINUS",
        "+=", "PLUS_EQ",
        ...
        "/*", q{parseNonNestedComment("*/")},
        "/+", q{parseNestedComment("+/")},
        "//", q{parseOneLineComment()},
        ...
        "if", "IF",
        "else", "ELSE",
        ...
 ) DLexer;

 During compilation, such non-tokens are recognized as code by the lexer 
 generator and called appropriately. A comprehensive library of such 
 routines completes a useful library.

What's wrong with regexes? That's pretty typical for lexers.

Oct 23 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 10/23/10 16:39 CDT, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:i9v8vq$2gvh$1 digitalmars.com...
 What's wrong with regexes? That's pretty typical for lexers.

I mentioned that using regexes is possible but would make it much more 
difficult to generate good quality lexers.

Besides, regexen are IMHO quite awkward at expressing certain things 
that can be easily parsed by hand, such as comments or recursive comments.

Andrei

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:i9vlep$8ao$1 digitalmars.com...
 On 10/23/10 16:39 CDT, Nick Sabalausky wrote:
 "Andrei Alexandrescu"<SeeWebsiteForEmail erdani.org>  wrote in message
 news:i9v8vq$2gvh$1 digitalmars.com...
 What's wrong with regexes? That's pretty typical for lexers.

 I mentioned that using regexes is possible but would make it much more 
 difficult to generate good quality lexers.

I see. Maybe a lexer 2.0 thing.

 Besides, regexen are IMHO quite awkward at expressing certain things that 
 can be easily parsed by hand, such as comments

//[^\n]*\n

/\*(.|\*[^/])*\*/

Pretty simple as far as regexes go, and I'm far from a regex expert. Plus 
there's nothing stopping the use of a vastly improved regex syntax like GOLD 
uses ( 
http://www.devincook.com/goldparser/doc/grammars/define-terminals.htm ). In 
that, the two regexes above would look like:

{LineCommentChar} = {Printable} - {LF}
LineComment = '//' {LineCommentChar}* {LF}

{BlockCommentChar} = {Printable} - [*]
{BlockCommentCharNoSlash} = {BlockCommentChar} - [/]
BlockComment = '/*' ({BlockCommentChar} | '*' {BlockCommentCharNoSlash})* 
'*/'

And further syntactical improvement is easy to imagine, such as in-line 
character set creation.

 or recursive comments.

Granted, although I think there is precident for regex engines that can 
handle matched nested pairs just fine.

Oct 23 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 What's wrong with regexes?

They don't handle recursion.

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:i9vn3l$bd1$2 digitalmars.com...
 Nick Sabalausky wrote:
 What's wrong with regexes?

 They don't handle recursion.

Neither do plain-old strings. But regexes will get you farther than plain 
strings before needing to resort to customized lexing.

But I'm a big data-driven fan anyway. If you're not than I can see why it 
wouldn't seem as appealing as it does to me.

In any case, if I have a chance I might see about adapting my Goldie ( 
www.dsource.org/projects/goldie ) library to more Phobos-friendly 
requirements. It's already a fully-usable lexer/parser (and the lexer/parser 
parts can be used independantly), with a complete grammar description 
language and I already have misc related tools written. And it's mostly 
working on D2 already (just need the next DMD because it has a fix for a bug 
that's a breaker for one of the tools). So if I can get it into a state more 
suitable for Phobos then that might end up putting things ahead of where 
they would be if someone just started from scratch. The initial versions 
might not be completely Phobos-ified, but it could definitely get there 
(especially if I had some guidance from people with more Phobos2 experience 
than me). Would Walter & co be interested in this? If not, I won't bother, 
but if so, then I may give it a shot.

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Nick Sabalausky" <a a.a> wrote in message 
news:ia01q3$1i1a$1 digitalmars.com...
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:i9vn3l$bd1$2 digitalmars.com...
 Nick Sabalausky wrote:
 What's wrong with regexes?

 They don't handle recursion.

 Neither do plain-old strings. But regexes will get you farther than plain 
 strings before needing to resort to customized lexing.

 But I'm a big data-driven fan anyway. If you're not than I can see why it 
 wouldn't seem as appealing as it does to me.

 In any case, if I have a chance I might see about adapting my Goldie ( 
 www.dsource.org/projects/goldie ) library to more Phobos-friendly 
 requirements. It's already a fully-usable lexer/parser (and the 
 lexer/parser parts can be used independantly), with a complete grammar 
 description language and I already have misc related tools written. And 
 it's mostly working on D2 already (just need the next DMD because it has a 
 fix for a bug that's a breaker for one of the tools). So if I can get it 
 into a state more suitable for Phobos then that might end up putting 
 things ahead of where they would be if someone just started from scratch. 
 The initial versions might not be completely Phobos-ified, but it could 
 definitely get there (especially if I had some guidance from people with 
 more Phobos2 experience than me). Would Walter & co be interested in this? 
 If not, I won't bother, but if so, then I may give it a shot.

And FWIW, I was already thnking about making some improvements to Goldie's 
API enyway.

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Nick Sabalausky" <a a.a> wrote in message 
news:ia01sk$1i7s$1 digitalmars.com...
 "Nick Sabalausky" <a a.a> wrote in message 
 news:ia01q3$1i1a$1 digitalmars.com...
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:i9vn3l$bd1$2 digitalmars.com...
 Nick Sabalausky wrote:
 What's wrong with regexes?

 They don't handle recursion.

 Neither do plain-old strings. But regexes will get you farther than plain 
 strings before needing to resort to customized lexing.

 But I'm a big data-driven fan anyway. If you're not than I can see why it 
 wouldn't seem as appealing as it does to me.

 In any case, if I have a chance I might see about adapting my Goldie ( 
 www.dsource.org/projects/goldie ) library to more Phobos-friendly 
 requirements. It's already a fully-usable lexer/parser (and the 
 lexer/parser parts can be used independantly), with a complete grammar 
 description language and I already have misc related tools written. And 
 it's mostly working on D2 already (just need the next DMD because it has 
 a fix for a bug that's a breaker for one of the tools). So if I can get 
 it into a state more suitable for Phobos then that might end up putting 
 things ahead of where they would be if someone just started from scratch. 
 The initial versions might not be completely Phobos-ified, but it could 
 definitely get there (especially if I had some guidance from people with 
 more Phobos2 experience than me). Would Walter & co be interested in 
 this? If not, I won't bother, but if so, then I may give it a shot.

 And FWIW, I was already thnking about making some improvements to Goldie's 
 API enyway.

But that's all if you want generalized lexing or parsing though. If you just 
want "lexing D code"/"parsing D code", then IMO anything other than adapting 
parts of DDMD would be the wrong way to go.

Oct 23 2010

bearophile <bearophileHUGS lycos.com> writes:

Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If you just 
 want "lexing D code"/"parsing D code", then IMO anything other than adapting 
 parts of DDMD would be the wrong way to go.

Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s)
willing?

Bye,
bearophile

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:ia0410$1lju$1 digitalmars.com...
 Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If you 
 just
 want "lexing D code"/"parsing D code", then IMO anything other than 
 adapting
 parts of DDMD would be the wrong way to go.

 Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s) 
 willing?

I'd certainly hope so. If it isn't, then that would probably mean DMD's FE 
license is incompatible with Phobos. Which would be rather...weird.

In any case, I asked that and a couple other Q's here, but haven't gotten an 
answer yet:
http://www.dsource.org/forums/viewtopic.php?t=5627

Oct 23 2010

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:

 "bearophile" <bearophileHUGS lycos.com> wrote in message
 news:ia0410$1lju$1 digitalmars.com...
 Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If you
 just
 want "lexing D code"/"parsing D code", then IMO anything other than
 adapting
 parts of DDMD would be the wrong way to go.

 Is the DDMD licence compatible with the Phobos one? Is the DDMD  
 author(s)
 willing?

 I'd certainly hope so. If it isn't, then that would probably mean DMD's  
 FE
 license is incompatible with Phobos. Which would be rather...weird.

 In any case, I asked that and a couple other Q's here, but haven't  
 gotten an
 answer yet:
 http://www.dsource.org/forums/viewtopic.php?t=5627

Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must  
be GPL too but I'm all for relicensing it as Boost.

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Denis Koroskin" <2korden gmail.com> wrote in message 
news:op.vk2na9bpo7cclz korden-pc...
 On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:

 "bearophile" <bearophileHUGS lycos.com> wrote in message
 news:ia0410$1lju$1 digitalmars.com...
 Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If you
 just
 want "lexing D code"/"parsing D code", then IMO anything other than
 adapting
 parts of DDMD would be the wrong way to go.

 Is the DDMD licence compatible with the Phobos one? Is the DDMD 
 author(s)
 willing?

 I'd certainly hope so. If it isn't, then that would probably mean DMD's 
 FE
 license is incompatible with Phobos. Which would be rather...weird.

 In any case, I asked that and a couple other Q's here, but haven't 
 gotten an
 answer yet:
 http://www.dsource.org/forums/viewtopic.php?t=5627

 Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must 
 be GPL too but I'm all for relicensing it as Boost.

According to a random file I picked out of trunk, it's dual-licensed with 
GPL (not sure which version) and Artistic (also not sure which version)

http://www.dsource.org/projects/dmd/browser/trunk/src/access.c

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Nick Sabalausky" <a a.a> wrote in message 
news:ia0v9p$11p$1 digitalmars.com...
 "Denis Koroskin" <2korden gmail.com> wrote in message 
 news:op.vk2na9bpo7cclz korden-pc...
 On Sun, 24 Oct 2010 06:55:22 +0400, Nick Sabalausky <a a.a> wrote:

 "bearophile" <bearophileHUGS lycos.com> wrote in message
 news:ia0410$1lju$1 digitalmars.com...
 Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If 
 you
 just
 want "lexing D code"/"parsing D code", then IMO anything other than
 adapting
 parts of DDMD would be the wrong way to go.

 Is the DDMD licence compatible with the Phobos one? Is the DDMD 
 author(s)
 willing?

 I'd certainly hope so. If it isn't, then that would probably mean DMD's 
 FE
 license is incompatible with Phobos. Which would be rather...weird.

 In any case, I asked that and a couple other Q's here, but haven't 
 gotten an
 answer yet:
 http://www.dsource.org/forums/viewtopic.php?t=5627

 Sorry, I wasn't checking the forum. IIRC DMD license is GPL so DDMD must 
 be GPL too but I'm all for relicensing it as Boost.

 According to a random file I picked out of trunk, it's dual-licensed with 
 GPL (not sure which version) and Artistic (also not sure which version)

 http://www.dsource.org/projects/dmd/browser/trunk/src/access.c

That does surprise me though, since I'm pretty sure Phobos is Boost License. 
Anyone know why the difference?

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 That does surprise me though, since I'm pretty sure Phobos is Boost License. 
 Anyone know why the difference?

Phobos is Boost licensed to enable maximum usage for any purpose.

The dmd front end is GPL licensed in order to ensure it stays open source and
to 
discourage closed source forks.

Oct 24 2010

Jacob Carlborg <doob me.com> writes:

On 2010-10-24 04:55, Nick Sabalausky wrote:
 "bearophile"<bearophileHUGS lycos.com>  wrote in message
 news:ia0410$1lju$1 digitalmars.com...
 Nick Sabalausky:

 But that's all if you want generalized lexing or parsing though. If you
 just
 want "lexing D code"/"parsing D code", then IMO anything other than
 adapting
 parts of DDMD would be the wrong way to go.

 Is the DDMD licence compatible with the Phobos one? Is the DDMD author(s)
 willing?

 I'd certainly hope so. If it isn't, then that would probably mean DMD's FE
 license is incompatible with Phobos. Which would be rather...weird.

 In any case, I asked that and a couple other Q's here, but haven't gotten an
 answer yet:
 http://www.dsource.org/forums/viewtopic.php?t=5627

As Walter wrote in the first post of this thread: "generally follow 
along with the C++ one so that they can be maintained in tandem" and in 
another post: "Since the license is mine, I can change the D version to 
the Boost license, no problem." 
http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D&artnum=120221

-- 
/Jacob Carlborg

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 Would Walter & co be interested in this? If not, I won't bother, 
 but if so, then I may give it a shot.

The problem is I never have used parser/lexer generators, so I am not really in 
a good position to review it.

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia0cfv$22kp$1 digitalmars.com...
 Nick Sabalausky wrote:
 Would Walter & co be interested in this? If not, I won't bother, but if 
 so, then I may give it a shot.

 The problem is I never have used parser/lexer generators, so I am not 
 really in a good position to review it.

Understandable.

FWIW though, Goldie isn't really lexer/parse generator per se. Traditional 
lexer/parser generators like lex/yacc or ANTLR will actually generate the 
source code for a lexer or parser. Goldie just has a single lexer and 
parser, both already pre-written. They're just completely data-driven:

Compared to the generators, Goldie's lexer is more like a general regex 
engine that simultaneously matches against multiple pre-compiled "regexes". 
By pre-compiled, I mean turned into a DFA - which is currently done by a 
separate non-source-available tool I didn't write, but I'm going to be 
writing my own version soon. By "regexes", I mean they're functionally 
regexes, but they're written in a much easier-to-read syntax than the 
typical PCRE.

Goldie's parser is really just a rather typical (from what I understand) 
LALR parser. I don't know how much you know about LALR's, but the parser 
itself is naturally grammar-independent (at least as described in CS texts). 
Using an LALR involves converting the grammar completely into a table of 
states and lookaheads (single-token lookahead; unlike LL, any more than that 
is never really needed), and then the actual parser is directed entirely by 
that table (much like how regexes are converted to data, ie DFA, and then 
processed generically), so it's completely grammar-independent.

And of, course, the actual lexer and parser can be 
optimized/rewritten/whatever with minimal impact on everything else.

If anyone's interested, further details are here(1):
http://www.devincook.com/goldparser/

Goldie does have optional code-generation capabilities, but it's entirely 
for the sake of providing a better statically-checked API tailored to your 
grammar (ex: to use D's type system to ensure at compile-time, instead of 
run-time, that token names are valid and that BNF rules you reference 
actually exist). It doesn't actually affect the lexer/parser in any 
non-trivial way.

(1): By that site's terminology, Goldie would technically be a "GOLD 
Engine", plus some additional tools. But, my current work on Goldie will cut 
that actual "GOLD Parser Builder" program completely out-of-the-loop (but it 
will still maintain compatibility with it for anyone who wants to use it).

Oct 23 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 If anyone's interested, further details are here(1):
 http://www.devincook.com/goldparser/

It looks nice, but in clicking around on FAQ, documentation, getting started, 
etc., I can't find any example code.

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia0pce$2pbk$1 digitalmars.com...
 Nick Sabalausky wrote:
 If anyone's interested, further details are here(1):
 http://www.devincook.com/goldparser/

 It looks nice, but in clicking around on FAQ, documentation, getting 
 started, etc., I can't find any example code.

Well, that's because that program (GOLD Parser Builder) is just a tool that 
takes in a grammar description file and spits out the lexer/parser DFA/LALR 
tables. Then you use any GOLD-compatible engine in any langauge (such as 
Goldie) to load the DFA/LALR tables and use them to lex/parse. (But again, 
I'm currently working on code that will do that without having to use GOLD 
Parser Builder.)

Here's some specific links for Goldie, and keep in mind that

1. I already have it pretty much converted to D2/Phobos in trunk (it used to 
be D1/Tango),
2. The API is not final and definitely open to suggestions (I have a few 
ideas already),
3. Any suggestions for improvements to the documentation, are, of course, 
welcome too,
4. Like I've said, in the next official release, using "GOLD Parser Builder" 
won't actually be required.

Main Goldie Project page:
    http://www.dsource.org/projects/goldie

Documentation for latest official release:
    http://www.semitwist.com/goldiedocs/current/Docs/

Samples directory in trunk:
    http://www.dsource.org/projects/goldie/browser/trunk/src/samples

Slightly old documentation for the samples:
    http://www.semitwist.com/goldiedocs/current/Docs/SampleApps/

There's two "calculator" samples. They're the same, but correspond to the 
two different styles Goldie supports. One, "dynamic", doesn't involve any 
source-code-generation step and can load and use any arbitrary grammar at 
runtime (neat usages of this are shown in the "ParseAnything" sample and in 
the "Parse" tool 
http://www.semitwist.com/goldiedocs/current/Docs/Tools/Parse/ ). The other, 
"static", does involve generating some source-code (via a comand-line tool), 
but that gives you an API that's statically-checked against the grammar. The 
differences and pros/cons between these two styles are explained here (let 
me know if it's unclear):
    http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

It looks like a solid engine, and a nice tool. Does it belong as part of
Phobos? 
I don't know. What do other D users think?

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
     http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/

One question I have is how does it compare with Spirit? That would be its main 
counterpart in the C++ space.

Oct 24 2010

div0 <div0 sourceforge.net> writes:

On 24/10/2010 18:19, Walter Bright wrote:
 Nick Sabalausky wrote:
 http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/

 One question I have is how does it compare with Spirit? That would be
 its main counterpart in the C++ space.

Spirit is a LL parser, so it's not really suitable for human edited 
input as doing exact error reporting is tricky.

-- 
My enormous talent is exceeded only by my outrageous laziness.
http://www.ssTk.co.uk

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia1ps7$1fq5$2 digitalmars.com...
 Nick Sabalausky wrote:
     http://www.semitwist.com/goldiedocs/current/Docs/APIOver/StatVsDyn/

 One question I have is how does it compare with Spirit? That would be its 
 main counterpart in the C++ space.

Can't say I'm really familiar with Spirit. From a brief lookover, these are 
my impresions of the differences:

Spirit: Grammar is embedded into your source code as actual C++ code.
Goldie: Grammar is defined in a domain-specfic language.
But either one could probably have a wrapper to work the other way.

Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD 
doesn't inherit Spirit's operator overloading: 
http://www.sstk.co.uk/spiritd.php )
Goldie: Operator overloading isn't really applicable, because of using a 
DSL.

As they stand, Spirit seems like it could be pretty handly for simple, quick 
little DSLs, ex, things for which Goldie might seem like overkill. But 
Goldie's interface could probably be improved to compete pretty well in 
those cases. OTOH, Goldie's approach (being based on GOLD) has a deliberate 
separation between grammar and parsing, which has it's own benefits; for 
instance, grammar definitions can be re-used for any purpose.

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 Can't say I'm really familiar with Spirit. From a brief lookover, these are 
 my impresions of the differences:
 
 Spirit: Grammar is embedded into your source code as actual C++ code.
 Goldie: Grammar is defined in a domain-specfic language.
 But either one could probably have a wrapper to work the other way.
 
 Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD 
 doesn't inherit Spirit's operator overloading: 
 http://www.sstk.co.uk/spiritd.php )
 Goldie: Operator overloading isn't really applicable, because of using a 
 DSL.
 
 As they stand, Spirit seems like it could be pretty handly for simple, quick 
 little DSLs, ex, things for which Goldie might seem like overkill. But 
 Goldie's interface could probably be improved to compete pretty well in 
 those cases. OTOH, Goldie's approach (being based on GOLD) has a deliberate 
 separation between grammar and parsing, which has it's own benefits; for 
 instance, grammar definitions can be re-used for any purpose.
 
 

Does Goldie have (like Spirit) a set of canned routines for things like numeric 
literals?

Can the D version of Goldie be turned into one file?

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia2duj$2j7e$1 digitalmars.com...
 Nick Sabalausky wrote:
 Can't say I'm really familiar with Spirit. From a brief lookover, these 
 are my impresions of the differences:

 Spirit: Grammar is embedded into your source code as actual C++ code.
 Goldie: Grammar is defined in a domain-specfic language.
 But either one could probably have a wrapper to work the other way.

 Spirit: Uses (abuses?) operator overloading (Although, apperently SpiritD 
 doesn't inherit Spirit's operator overloading: 
 http://www.sstk.co.uk/spiritd.php )
 Goldie: Operator overloading isn't really applicable, because of using a 
 DSL.

 As they stand, Spirit seems like it could be pretty handly for simple, 
 quick little DSLs, ex, things for which Goldie might seem like overkill. 
 But Goldie's interface could probably be improved to compete pretty well 
 in those cases. OTOH, Goldie's approach (being based on GOLD) has a 
 deliberate separation between grammar and parsing, which has it's own 
 benefits; for instance, grammar definitions can be re-used for any 
 purpose.

 Does Goldie have (like Spirit) a set of canned routines for things like 
 numeric literals?

No, but such things can easily be provided in the docs for simple 
copy-paste. For instance:

DecimalLiteral = {Number} ({Number} | '_')*

HexLiteral = '0' [xX] ({Number} | [ABCDEFabcdef_])+

Identifier = ('_' | {Letter}) ('_' | {AlphaNumeric})*

{StringChar} = {Printable} - ["]
StringLiteral = '"' ({StringChar} | '\' {Printable})* '"'

All one would need to do to use those is copy-paste them into their grammar 
definition. Some sort of import mechanism could certainly be added though, 
to allow for selective import of pre-defined things like that.

There are many pre-defined character sets though (and others can be 
manually-created, of course): 
http://www.devincook.com/goldparser/doc/grammars/character-sets.htm

 Can the D version of Goldie be turned into one file?

Assuming just the library and not the included tools (many of which could be 
provided as part of the library, though), and not counting files generated 
for the static-style, then yes, but it would probably be a bit long.

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 Does Goldie have (like Spirit) a set of canned routines for things like 
 numeric literals?

 
 No, but such things can easily be provided in the docs for simple 
 copy-paste. For instance:
 
 DecimalLiteral = {Number} ({Number} | '_')*
 
 HexLiteral = '0' [xX] ({Number} | [ABCDEFabcdef_])+
 
 Identifier = ('_' | {Letter}) ('_' | {AlphaNumeric})*
 
 {StringChar} = {Printable} - ["]
 StringLiteral = '"' ({StringChar} | '\' {Printable})* '"'
 
 All one would need to do to use those is copy-paste them into their grammar 
 definition. Some sort of import mechanism could certainly be added though, 
 to allow for selective import of pre-defined things like that.

In the regexp code, I provided special regexes for email addresses and URLs. 
Those are hard to get right, so it's a large convenience to provide them.

Also, many literals can be fairly complex, and evaluating them can produce 
errors (such as integer overflow in the numeric literals). Having canned ones 
makes it much quicker for a user to get going.

I'm guessing that a numeric literal is returned as a string. Is this string 
allocated on the heap? If so, it's a performance problem. Storage allocation 
costs figure large when trying to lex millions of lines.


 There are many pre-defined character sets though (and others can be 
 manually-created, of course): 
 http://www.devincook.com/goldparser/doc/grammars/character-sets.htm
 
 Can the D version of Goldie be turned into one file?

 
 Assuming just the library and not the included tools (many of which could be 
 provided as part of the library, though), and not counting files generated 
 for the static-style, then yes, but it would probably be a bit long.

Long files aren't a problem. That's why we have .di files! I worry more about 
clutter.

Oct 24 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia34up$ldb$1 digitalmars.com...
 In the regexp code, I provided special regexes for email addresses and 
 URLs. Those are hard to get right, so it's a large convenience to provide 
 them.

 Also, many literals can be fairly complex, and evaluating them can produce 
 errors (such as integer overflow in the numeric literals). Having canned 
 ones makes it much quicker for a user to get going.

I'm not sure what exectly you're suggesting in these two paragraphs? (Or 
just commenting?)

 I'm guessing that a numeric literal is returned as a string. Is this 
 string allocated on the heap? If so, it's a performance problem. Storage 
 allocation costs figure large when trying to lex millions of lines.

Good point. I've just checked and there is allocation going on for each 
terminal lexed. But thanks to D's awesomeness, I can easily fix that to just 
use a slice of the original source string. I'll do that...

 Long files aren't a problem. That's why we have .di files! I worry more 
 about clutter.

I really find long files to be a pain to read and edit. It would be nice if 

Then, modules with a lot of code could be broken down as appropriate for 
their maintainers without having to bother the users with the "module 
blah.all" workaround (which Goldie currently uses, but I realize isn't 
normal Phobos style). AIUI, .di files don't really solve that.

There is one other other minor related issue, though. One of my big 
principles for Goldie is flexibility. So in addition to the basic API that 
most people would use, I like to expose lower-level APIs for people who 
might want to sidestep certain parts of Goldie, or provide other 
less-typical but potentially useful things. But such things shouldn't be 
automatically imported for typical users, so that sort of stuff would be 
best left to a separate-but-related module.

Maybe it's just too late over here for me, but can you be more specific on 
"clutter"? Do you mean like API clutter?

Oct 24 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:ia34up$ldb$1 digitalmars.com...
 In the regexp code, I provided special regexes for email addresses and 
 URLs. Those are hard to get right, so it's a large convenience to provide 
 them.

 Also, many literals can be fairly complex, and evaluating them can produce 
 errors (such as integer overflow in the numeric literals). Having canned 
 ones makes it much quicker for a user to get going.

 
 I'm not sure what exectly you're suggesting in these two paragraphs? (Or 
 just commenting?)

Does Goldie's lexer not convert numeric literals to integer values?


 I'm guessing that a numeric literal is returned as a string. Is this 
 string allocated on the heap? If so, it's a performance problem. Storage 
 allocation costs figure large when trying to lex millions of lines.

 
 Good point. I've just checked and there is allocation going on for each 
 terminal lexed. But thanks to D's awesomeness, I can easily fix that to just 
 use a slice of the original source string. I'll do that...

Are all tokens returned as strings?


 Long files aren't a problem. That's why we have .di files! I worry more 
 about clutter.

 
 I really find long files to be a pain to read and edit. It would be nice if 

 Then, modules with a lot of code could be broken down as appropriate for 
 their maintainers without having to bother the users with the "module 
 blah.all" workaround (which Goldie currently uses, but I realize isn't 
 normal Phobos style). AIUI, .di files don't really solve that.
 
 There is one other other minor related issue, though. One of my big 
 principles for Goldie is flexibility. So in addition to the basic API that 
 most people would use, I like to expose lower-level APIs for people who 
 might want to sidestep certain parts of Goldie, or provide other 
 less-typical but potentially useful things. But such things shouldn't be 
 automatically imported for typical users, so that sort of stuff would be 
 best left to a separate-but-related module.

If I may suggest, leave the low level stuff out of the api until demand for it 
justifies it. It's hard to predict just what will be useful, so I suggest 
conservatism rather than kitchen sink. It can always be added later, but it's 
really hard to remove.


 Maybe it's just too late over here for me, but can you be more specific on 
 "clutter"? Do you mean like API clutter?

That too, but I meant a clutter of files. Long files aren't a problem with D.

Oct 25 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia3c3r$14k8$1 digitalmars.com...
 Does Goldie's lexer not convert numeric literals to integer values?

 Are all tokens returned as strings?

Goldie's lexer (and parser) are based on the GOLD system ( 
http://www.devincook.com/goldparser/ ) which is deliberately independent of 
both grammar and implementation language. As such, it doesn't know anything 
about what the specific terminals actually represent (There are 4 exceptions 
though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex 
errors), and the EOF token.) So the lexed data is always represented as a 
string.

Although, the lexer actually returns an array of "class Token" ( 
http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To 
get the original data that got lexed or parsed into that token, you call 
"toString()". (BTW, there are currently different "modes" of "toString()" 
for non-terminals, but I'm considering just ripping them all out and 
replacing them with a single "return a slice from the start of the first 
terminal to the end of the last terminal" - unless you think it would be 
useful to get a representation of the non-terminal's original data sans 
comments/whitespace, or with comments/whitespace converted to a single 
space.)

I'm not sure that calling "to!whatever(token.toString())" is really all that 
much of a problem for user code.

 If I may suggest, leave the low level stuff out of the api until demand 
 for it justifies it. It's hard to predict just what will be useful, so I 
 suggest conservatism rather than kitchen sink. It can always be added 
 later, but it's really hard to remove.

That may be a good idea.

 That too, but I meant a clutter of files. Long files aren't a problem with 
 D.

Well, again, it may not be a problem with DMD, but I really think 
reading/editing a long file is a pain regardless of language. Maybe we just 
have different ideas of "long file"? To put it into numbers: At the moment, 
Goldie's library (not counting tools and the optional generated 
"static-mode" files) is about 3200 lines, including comment/blank lines. 
That size would be pretty unwieldy to maintain as a single source file, 
particularly since Goldie has a natural internal organization.

Personally, I'd much rather have a clutter of source files than a cluttered 
source file. (But of course, I don't go to Java extremes and put *every* 
tiny little thing in a separate file.) As long as the complexity of having 
multiple files isn't passed along to user code (hence the frequent "module 
foo.all" idiom), then I can't say I really see a problem.

Oct 25 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:ia3c3r$14k8$1 digitalmars.com...
 Does Goldie's lexer not convert numeric literals to integer values?

 Are all tokens returned as strings?

 
 Goldie's lexer (and parser) are based on the GOLD system ( 
 http://www.devincook.com/goldparser/ ) which is deliberately independent of 
 both grammar and implementation language. As such, it doesn't know anything 
 about what the specific terminals actually represent (There are 4 exceptions 
 though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex 
 errors), and the EOF token.) So the lexed data is always represented as a 
 string.
 
 Although, the lexer actually returns an array of "class Token" ( 
 http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To 
 get the original data that got lexed or parsed into that token, you call 
 "toString()". (BTW, there are currently different "modes" of "toString()" 
 for non-terminals, but I'm considering just ripping them all out and 
 replacing them with a single "return a slice from the start of the first 
 terminal to the end of the last terminal" - unless you think it would be 
 useful to get a representation of the non-terminal's original data sans 
 comments/whitespace, or with comments/whitespace converted to a single 
 space.)
 
 I'm not sure that calling "to!whatever(token.toString())" is really all that 
 much of a problem for user code.

Consider a string literal, say "abc\"def". With Goldie's method, I infer this 
string has to be scanned twice. Once to find its limits, and the second to 
convert it to the actual string. The latter is user code and will have to 
replicate whatever Goldie did.


 If I may suggest, leave the low level stuff out of the api until demand 
 for it justifies it. It's hard to predict just what will be useful, so I 
 suggest conservatism rather than kitchen sink. It can always be added 
 later, but it's really hard to remove.

 
 That may be a good idea.

What Goldie will be compared against is Spirit. Spirit is a reasonably 
successful add-on to C++. Goldie doesn't have to do things the same way as 
Spirit (expression templates - ugh), but it should be as easy to use and at 
least as powerful.


 That too, but I meant a clutter of files. Long files aren't a problem with 
 D.

 
 Well, again, it may not be a problem with DMD, but I really think 
 reading/editing a long file is a pain regardless of language. Maybe we just 
 have different ideas of "long file"? To put it into numbers: At the moment, 
 Goldie's library (not counting tools and the optional generated 
 "static-mode" files) is about 3200 lines, including comment/blank lines. 
 That size would be pretty unwieldy to maintain as a single source file, 
 particularly since Goldie has a natural internal organization.

Actually, I think 3200 lines is of moderate, not large, size :-)


 Personally, I'd much rather have a clutter of source files than a cluttered 
 source file. (But of course, I don't go to Java extremes and put *every* 
 tiny little thing in a separate file.) As long as the complexity of having 
 multiple files isn't passed along to user code (hence the frequent "module 
 foo.all" idiom), then I can't say I really see a problem.

I tend to just not like having to constantly grep to see which file XXX is in.

Oct 25 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia59si$1r0j$1 digitalmars.com...
 Consider a string literal, say "abc\"def". With Goldie's method, I infer 
 this string has to be scanned twice. Once to find its limits, and the 
 second to convert it to the actual string.

Yea, that is true. With that string in the input, the value given to the 
user code will be:

assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});

That's a consequence of the grammar being separated from lexing/parsing 
implementation.

You're right that that does seem less than ideal. Although I'm not sure how 
to remedy that without loosing the independence between grammar and 
lex/parse implementation that is the main point of the GOLD-based style.

But there's something I don't quite understand about the approach you're 
suggesting: You seem to be suggesting that a terminal be progressively 
converted into its final form *as* it's still in the process of being 
recognized by the DFA. Which means, you don't know *what* you're supposed to 
be converting it into *while* you're converting it. Which means, you have to 
be speculatively converting it into all types of tokens that the current DFA 
state could possibly be on its way towards accepting (also, the DFA would 
need to contain a record of possible terminals for each DFA state). And then 
the result is thrown away if it turns out to be a different terminal. Is 
this correct? If so, is there generally enough lexical difference between 
the terminals that need such treatment to compensate for the extra 
processing needed in situations that are closer to worst-case (that is, in 
comparison to Goldie's current approach)?

If all of that is so, then what would be your thoughts on this approach?:

Suppose Goldie had a way to associate an optional "simultaneous/lockstep 
conversion" to a type of terminal. For instance:

myLanguage.associateConversion("StringLiteral", new 
StringLiteralConverter());

Then, 'StringLiteralConverter' would be something that could be either 
user-provided or offered by Goldie (both ways would be supported). It would 
be some sort of class or something that had three basic functions:

class StringLiteralConverter : ITerminalConverter
{
    void process(dchar c) {...}

    // Or maybe this to make it possible to minimize allocations
    // in certain circumstances by utilizing slices:
    void process(dchar c, size_t indexIntoSource, string fullOrignalSource) 
{...}

    Variant emit() {...}
    void clear() {...}
}

Each state in the lexer's DFA would know which terminals it could possibly 
be processing. And for each of those terminals that has an associated 
converter, the lexer will call 'process()'. If a terminal is accepted, 
'emit' is called to get the final result (and maybe do any needed 
finalization first), and then 'clear' is called on all converters that had 
been used.

This feature would preclude the use of the actual "GOLD Parser Builder" 
program, but since I'm writing a tool to handle that functionality anyway, 
I'm not too concerned about that.

Do you think that would work? Would its benefits be killed by the overhead 
introduced? If so, could those overheads be sufficiently reduced without 
scrapping the general idea?

 What Goldie will be compared against is Spirit. Spirit is a reasonably 
 successful add-on to C++. Goldie doesn't have to do things the same way as 
 Spirit (expression templates - ugh), but it should be as easy to use and 
 at least as powerful.

Understood.

 Personally, I'd much rather have a clutter of source files than a 
 cluttered source file. (But of course, I don't go to Java extremes and 
 put *every* tiny little thing in a separate file.) As long as the 
 complexity of having multiple files isn't passed along to user code 
 (hence the frequent "module foo.all" idiom), then I can't say I really 
 see a problem.

 I tend to just not like having to constantly grep to see which file XXX is 
 in.

Diff'rent strokes, I guess. I've only ever had that problem with Tango, 
which seems to kinda follow from the Java-STD-lib school of API design (no 
offense intended, Tango guys). But if I'm working on something that involves 
different sections of a codebase, which is very frequent, then I find it to 
be quite a pain to constantly scroll all around instead of just Ctrl-Tabbing 
between open files in different tabs.

Oct 25 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:ia59si$1r0j$1 digitalmars.com...
 Consider a string literal, say "abc\"def". With Goldie's method, I infer 
 this string has to be scanned twice. Once to find its limits, and the 
 second to convert it to the actual string.

 
 Yea, that is true. With that string in the input, the value given to the 
 user code will be:
 
 assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});
 
 That's a consequence of the grammar being separated from lexing/parsing 
 implementation.
 
 You're right that that does seem less than ideal. Although I'm not sure how 
 to remedy that without loosing the independence between grammar and 
 lex/parse implementation that is the main point of the GOLD-based style.
 
 But there's something I don't quite understand about the approach you're 
 suggesting: You seem to be suggesting that a terminal be progressively 
 converted into its final form *as* it's still in the process of being 
 recognized by the DFA. Which means, you don't know *what* you're supposed to 
 be converting it into *while* you're converting it. Which means, you have to 
 be speculatively converting it into all types of tokens that the current DFA 
 state could possibly be on its way towards accepting (also, the DFA would 
 need to contain a record of possible terminals for each DFA state). And then 
 the result is thrown away if it turns out to be a different terminal. Is 
 this correct? If so, is there generally enough lexical difference between 
 the terminals that need such treatment to compensate for the extra 
 processing needed in situations that are closer to worst-case (that is, in 
 comparison to Goldie's current approach)?

Probably that's why I don't use lexer generators. Building lexers is the 
simplest part of building a compiler, and I've always been motivated by trying 
to make it as fast as possible.

To specifically answer your question, yes, in the lexers I make, you know
you're 
parsing a string, so you process it as you parse it.



 If all of that is so, then what would be your thoughts on this approach?:
 
 Suppose Goldie had a way to associate an optional "simultaneous/lockstep 
 conversion" to a type of terminal. For instance:
 
 myLanguage.associateConversion("StringLiteral", new 
 StringLiteralConverter());
 
 Then, 'StringLiteralConverter' would be something that could be either 
 user-provided or offered by Goldie (both ways would be supported). It would 
 be some sort of class or something that had three basic functions:
 
 class StringLiteralConverter : ITerminalConverter
 {
     void process(dchar c) {...}
 
     // Or maybe this to make it possible to minimize allocations
     // in certain circumstances by utilizing slices:
     void process(dchar c, size_t indexIntoSource, string fullOrignalSource) 
 {...}
 
     Variant emit() {...}
     void clear() {...}
 }
 
 Each state in the lexer's DFA would know which terminals it could possibly 
 be processing. And for each of those terminals that has an associated 
 converter, the lexer will call 'process()'. If a terminal is accepted, 
 'emit' is called to get the final result (and maybe do any needed 
 finalization first), and then 'clear' is called on all converters that had 
 been used.
 
 This feature would preclude the use of the actual "GOLD Parser Builder" 
 program, but since I'm writing a tool to handle that functionality anyway, 
 I'm not too concerned about that.
 
 Do you think that would work? Would its benefits be killed by the overhead 
 introduced? If so, could those overheads be sufficiently reduced without 
 scrapping the general idea?

I don't know. I'd have to study the issue for a while. I suggest taking a look 
at dmd's lexer and compare. I'm not sure what Spirit's approach to this is.


 What Goldie will be compared against is Spirit. Spirit is a reasonably 
 successful add-on to C++. Goldie doesn't have to do things the same way as 
 Spirit (expression templates - ugh), but it should be as easy to use and 
 at least as powerful.

 
 Understood.

Oct 25 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia5j41$2bnk$1 digitalmars.com...
 To specifically answer your question, yes, in the lexers I make, you know 
 you're parsing a string, so you process it as you parse it.

...

 I don't know. I'd have to study the issue for a while. I suggest taking a 
 look at dmd's lexer and compare. I'm not sure what Spirit's approach to 
 this is.

I've taken a deeper look at Spirit's docs:

In the older Spirit 1.x, the lexing is handled as part of the parsing. The 
structure of it definitely suggests it should be easy for it to do all 
token-conversion right as the string is being lexed, although I couldn't 
tell whether or not it actually did so (I'd have to look at the source). 
But, since Spirit 1.x doesn't handle lexing separately from parsing, I 
*think* backtracking (it *is* a backtracking parser) results in re-lexing, 
even for terminals that never get special processing, such as keywords (But 
I'm not completely certain because I don't have much experience with LL).

In Spirit 2.x, standard usage involves having the lexing separate from 
parsing. I didn't see anything at all in the docs for Spirit 2.x that seemed 
to suggest even the possibility of it processing tokens as they're lexed. 
However, Spirit is designed with heavy policy-based customizability in mind, 
so such a thing might still possible in Spirit 2.x...But if so, it's 
definitely an advanced feature (or just really poorly documented).

I have thought of another way to get such an ability into Goldie, and it 
would be very easy-to-use, but it would also be a fairly non-trivial to 
implement. And really, I'm starting to question again how important it would 
*really* be, at least initially. When I think of typical code, usually only 
a small amount of it is made up of the the sorts of terminals that would 
need extra processing.

I have to admit, I still have no idea whether or not it would be worth it to 
get Goldie into Phobos. Maybe, maybe not, I dunno. I think popular opinion 
would probably be the best gauge of that. It seems like we're the only ones 
still in this thread, though...maybe that's a bad sign? ;) I do still think 
that if your primary goal is to provide parsing of D code through Phobos, 
then adapting DDMD would be the best best. Goldie would be more appropriate 
if customized lexing/parsing is the goal.

Oct 26 2010

bearophile <bearophileHUGS lycos.com> writes:

Nick Sabalausky:

 I've taken a deeper look at Spirit's docs:

I have not used Spirit, but from what I have read, it doesn't scale (the
compilation becomes too much slower when the system you have built becomes
bigger).

Bye,
bearophile

Oct 26 2010

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:ia6a0h$nst$1 digitalmars.com...
 Nick Sabalausky:

 I've taken a deeper look at Spirit's docs:

 I have not used Spirit, but from what I have read, it doesn't scale (the 
 compilation becomes too much slower when the system you have built becomes 
 bigger).

I think that's just because it's C++ though. I'd bet a D lib that worked the 
same way would probably do a lot better.

In any case, I started writing a comparison of the main fundamental 
differences between Spirit and Goldie, and it ended up kinda rambling and 
not so just-the-main-fundamentals. But the gist was: Spirit is very flexible 
in how grammars are defined and processed, and Goldie is very flexible in 
what you can do with a given grammar once it's written (ie, how much mileage 
you can get out of it without changing one line of grammar and without 
designing it from the start to be flexible). Goldie does get some of that 
"flexibility in what you can do with it" though by tossing in some features 
and some limitations/requirements that Spirit leaves as "if you want it, put 
it in yourself (more or less manually), otherwise you don't pay a price for 
it."

I think both approaches have their merits. Although I haven't a clue which 
is best for Phobos, or if Phobos even needs either.

Oct 26 2010

Leandro Lucarella <luca llucax.com.ar> writes:

bearophile, el 26 de octubre a las 06:20 me escribiste:
 Nick Sabalausky:
 
 I've taken a deeper look at Spirit's docs:

 
 I have not used Spirit, but from what I have read, it doesn't scale
 (the compilation becomes too much slower when the system you have
 built becomes bigger).

I can confirm that, at least for Spirit 1, and for simple things it
looks "nice" (in the C++ scale), but for real more complex things, the
resulting code is really a mess.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
A can of diet coke will float in water
While a can of regular coke will sink

Oct 26 2010

dennis luehring <dl.soluz gmx.net> writes:

Am 26.10.2010 15:55, schrieb Leandro Lucarella:
 bearophile, el 26 de octubre a las 06:20 me escribiste:
  Nick Sabalausky:

  >  I've taken a deeper look at Spirit's docs:

  I have not used Spirit, but from what I have read, it doesn't scale
  (the compilation becomes too much slower when the system you have
  built becomes bigger).

 I can confirm that, at least for Spirit 1, and for simple things it
 looks "nice" (in the C++ scale), but for real more complex things, the
 resulting code is really a mess.

yupp - Spirit feels right on the integration-side, but becomes more and
more evil when stuff gets bigger

a compiletime-ebnf-script parser would do better, especially when
the ebnf-script comes through compiletime-file-include and can be 
used/developed from outside in an ide like gold parsers

a compiletime-parse could "generated" the stub code like Spirit do
but without beeing to much inside the language itselfe

Oct 26 2010

dennis luehring <dl.soluz gmx.net> writes:

Am 26.10.2010 16:48, schrieb dennis luehring:
 Am 26.10.2010 15:55, schrieb Leandro Lucarella:
  bearophile, el 26 de octubre a las 06:20 me escribiste:
   Nick Sabalausky:

   >   I've taken a deeper look at Spirit's docs:

   I have not used Spirit, but from what I have read, it doesn't scale
   (the compilation becomes too much slower when the system you have
   built becomes bigger).

  I can confirm that, at least for Spirit 1, and for simple things it
  looks "nice" (in the C++ scale), but for real more complex things, the
  resulting code is really a mess.

 yupp - Spirit feels right on the integration-side, but becomes more and
 more evil when stuff gets bigger

 a compiletime-ebnf-script parser would do better, especially when
 the ebnf-script comes through compiletime-file-include and can be
 used/developed from outside in an ide like gold parsers

 a compiletime-parse could "generated" the stub code like Spirit do
 but without beeing to much inside the language itselfe

that combined with compiletime-features something like the bsn-parse do

http://code.google.com/p/bsn-goldparser/

i think this all is very very doable in D

Oct 26 2010

"Nick Sabalausky" <a a.a> writes:

"dennis luehring" <dl.soluz gmx.net> wrote in message 
news:ia6s3b$1q90$1 digitalmars.com...
 Am 26.10.2010 16:48, schrieb dennis luehring:
 Am 26.10.2010 15:55, schrieb Leandro Lucarella:

 yupp - Spirit feels right on the integration-side, but becomes more and
 more evil when stuff gets bigger


Goldie (and any GOLD-based system, really) should scale up pretty well. The 
only possible scaling-up issues would be:

1. Splitting a large grammar across multiple files is not yet supported (and 
if I do add support for that in Goldie, and I may, then the "GOLD Parser 
Builder" IDE wouldn't know how to handle it).


classic-style ASP, or anything that involves a preprocessing step that 
hasn't already been done) aren't really supported yet. Spirit 1.x should be 
able to handle that, at least in some cases. I think Spirit 2.x's separation 
of lexing and parsing may have some trouble with it though.

3. I haven't had a chance to add any sort of character set optimization yet, 
so grammars that allow a large amount of Unicode characters will probably be 
slow to generate into tables and slow to lex. At least until I get around to 
taking care of that.

I've never actually used Spirit, but its scaling up issues do seem to be a 
fairly fundamental issue with it's design (particularly so since it's C++). 
Although they do say on their site that some C++ compilers can handle Spirit 
without compile time growing exponentially in relation to grammar 
complexity.

 a compiletime-ebnf-script parser would do better, especially when
 the ebnf-script comes through compiletime-file-include and can be
 used/developed from outside in an ide like gold parsers


There's one problem with doing things via CTFE that us D folks often 
overlook: You can't use a build tool like make/rake/scons to detect when 
that particular data doesn't need to be recomputed and can thus be skipped. 
(Although it may be possible to manually make it work that way *if* CTFE 
gains the ability to access the filesystem.)

I'm not opposed to the idea of making Goldie's compiling-a-grammar (ie, 
"process a grammar into the appropriate tables") ctfe-able, but it does 
already work in a way that you only need to compile a grammar into tables 
when you change the grammar (and changing a grammar is needed less 
frequently in Goldie than in Spirit because in Goldie no processing code is 
ever embedded into the grammar.).

 that combined with compiletime-features something like the bsn-parse do

 http://code.google.com/p/bsn-goldparser/

 i think this all is very very doable in D

Yea, I was pretty impressed with BSN. I definitely want to do something like 
that for Goldie, but I have a somewhat different idea in mind: I'm thinking 
of enhancing the grammar definition language so that all the information on 
how to construct an AST is right there in the grammar definition itself, and 
can thus be completely automated by Goldie. This would be in line with 
GOLD's philosophy and benefits of keeping the grammar definition separate 
from the processing code. And it would also be a step towards the idea I've 
had in mind since before Goldie was Goldie of being able to automate (or 
partially automate) generalized language translation/transformation.

Oct 26 2010

Jacob Carlborg <doob me.com> writes:

On 2010-10-26 04:44, Nick Sabalausky wrote:
 "Walter Bright"<newshound2 digitalmars.com>  wrote in message
 news:ia59si$1r0j$1 digitalmars.com...
 Consider a string literal, say "abc\"def". With Goldie's method, I infer
 this string has to be scanned twice. Once to find its limits, and the
 second to convert it to the actual string.

 Yea, that is true. With that string in the input, the value given to the
 user code will be:

 assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});

 That's a consequence of the grammar being separated from lexing/parsing
 implementation.

 You're right that that does seem less than ideal. Although I'm not sure how
 to remedy that without loosing the independence between grammar and
 lex/parse implementation that is the main point of the GOLD-based style.

 But there's something I don't quite understand about the approach you're
 suggesting: You seem to be suggesting that a terminal be progressively
 converted into its final form *as* it's still in the process of being
 recognized by the DFA. Which means, you don't know *what* you're supposed to
 be converting it into *while* you're converting it.

I don't have much knowledge in this area but isn't this what a 
look-ahead is for? Just look ahead (hopefully) one character and decide 
what to convert to.

-- 
/Jacob Carlborg

Oct 26 2010

=?iso-8859-2?B?VG9tZWsgU293afFza2k=?= <just ask.me> writes:

Dnia 22-10-2010 o 21:48:49 Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> napisa=B3(a):

 On 10/22/10 14:02 CDT, Tomek Sowi=F1ski wrote:
 Interesting idea. Here's another: D will soon need bindings for CORBA=


,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrize=


d
 with a lexical grammar would bring more ROI, I got a hunch D's templa=


tes
 are strong enough to pull this off without any source code generation=


 ala JavaCC. The books I read on compilers say tokenization is a solve=


d
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenize=

r  =

 generator.

 I have in mind the entire implementation of a simple design, but never=

  =

 had the time to execute on it. The tokenizer would work like this:

 alias Lexer!(
      "+", "PLUS",
      "-", "MINUS",
      "+=3D", "PLUS_EQ",
      ...
      "if", "IF",
      "else", "ELSE"
      ...
 ) DLexer;

Yes. One remark: native language constructs scale better for a grammar:

enum TokenDef : string {
     Digit =3D "[0-9]",
     Letter =3D "[a-zA-Z_]",
     Identifier =3D Letter~'('~Letter~'|'~Digit~')',
     ...
     Plus =3D "+",
     Minus =3D "-",
     PlusEq =3D "+=3D",
     ...
     If =3D "if",
     Else =3D "else",
     ...
}
alias Lexer!TokenDef DLexer;

BTW, there's a bug related:
http://d.puremagic.com/issues/show_bug.cgi?id=3D2950

 Such a declaration generates numeric values DLexer.PLUS etc. and  =

 generates an efficient code that extracts a stream of tokens from a  =

 stream of text. Each token in the token stream has the ID and the text=

.

All good ideas.

 Comments, strings etc. can be handled in one of several ways but that'=

s  =

 a longer discussion.

The discussion's started anyhow. So what're the options?

-- =

Tomek

Oct 22 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisał(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

Agreed, of all the things desired for D, a D tokenizer would rank pretty 
low I think.

Another thing, even though a tokenizer generator would be much more 
desirable, I wonder if it is wise to have that in the standard library? 
It does not seem to be of wide enough interest to be in a standard 
library. (Out of curiosity, how many languages have such a thing in 
their standard library?)


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowi=C5=84ski wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
=20
 napisa=C5=82(a):
 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.
=20
 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.
=20
 It should:
=20
 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem
=20
 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.
=20
 Anyone want to own this?

=20
 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templat=



es
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

=20
 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

=20
 Agreed, of all the things desired for D, a D tokenizer would rank pretty
 low I think.
=20
 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

We want to make it easy for tools to be built to work on and deal with D co=
de.=20
An IDE, for example, needs to be able to tokenize and parse D code. A progr=
am=20
like lint needs to be able to tokenize and parse D code. By providing a lex=
er=20
and parser in the standard library, we are making it far easier for such to=
ols=20
to be written, and they could be of major benefit to the D community. Sure,=
 the=20
average program won't need to lex or parse D, but some will, and making it =
easy=20
to do will make it a lot easier for such programs to be written.

=2D Jonathan M Davis

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 21:27, Jonathan M Davis wrote:
 On Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright<newshound2 digitalmars.com>

 napisał(a):
 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

 Agreed, of all the things desired for D, a D tokenizer would rank pretty
 low I think.

 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

 We want to make it easy for tools to be built to work on and deal with D code.
 An IDE, for example, needs to be able to tokenize and parse D code. A program
 like lint needs to be able to tokenize and parse D code. By providing a lexer
 and parser in the standard library, we are making it far easier for such tools
 to be written, and they could be of major benefit to the D community. Sure, the
 average program won't need to lex or parse D, but some will, and making it easy
 to do will make it a lot easier for such programs to be written.

 - Jonathan M Davis

And by providing a lexer and a parser outside the standard library, 
wouldn't it make it just as easy for those tools to be written? What's 
the advantage of being in the standard library? I see only 
disadvantages: to begin with it potentially increases the time that 
Walter or other Phobos contributors may have to spend on it, even if 
it's just reviewing patches or making sure the code works.


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:
 On 19/11/2010 21:27, Jonathan M Davis wrote:
 On Friday 19 November 2010 13:03:53 Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowi=C5=84ski wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright<newshound2 digitalmars.com>
=20
 napisa=C5=82(a):
 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.
=20
 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.
=20
 It should:
=20
 1. support a range interface for its input, and a range interface f=






or
 its output
 2. optionally not generate lexical errors, but just try to recover
 and continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem
=20
 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.
=20
 Anyone want to own this?

=20
 Interesting idea. Here's another: D will soon need bindings for CORB=





A,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametriz=





ed
 with a lexical grammar would bring more ROI, I got a hunch D's
 templates are strong enough to pull this off without any source code
 generation ala JavaCC. The books I read on compilers say tokenization
 is a solved problem, so the theory part on what a good abstraction
 should be is done. What you think?

=20
 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokeniz=




er
 generator.

=20
 Agreed, of all the things desired for D, a D tokenizer would rank pret=



ty
 low I think.
=20
 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

=20
 We want to make it easy for tools to be built to work on and deal with D
 code. An IDE, for example, needs to be able to tokenize and parse D
 code. A program like lint needs to be able to tokenize and parse D code.
 By providing a lexer and parser in the standard library, we are making
 it far easier for such tools to be written, and they could be of major
 benefit to the D community. Sure, the average program won't need to lex
 or parse D, but some will, and making it easy to do will make it a lot
 easier for such programs to be written.
=20
 - Jonathan M Davis

=20
 And by providing a lexer and a parser outside the standard library,
 wouldn't it make it just as easy for those tools to be written? What's
 the advantage of being in the standard library? I see only
 disadvantages: to begin with it potentially increases the time that
 Walter or other Phobos contributors may have to spend on it, even if
 it's just reviewing patches or making sure the code works.

If nothing, else, it makes it easier to keep in line with dmd itself. Since=
 the=20
dmd front end is LGPL, it's not possible to have a Boost port of it (like t=
he=20
Phobos version will be) without Walter's consent. And I'd be surprised if h=
e did=20
that for a third party library (though he seems to be pretty open on a lot =
of=20
that kind of stuff). Not to mention, Walter and the core developers are _ex=
actly_=20
the kind of people that you want working on a lexer or parser of the langua=
ge=20
itself, because they're the ones who work on it.

=2D Jonathan M Davis

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 22:02, Jonathan M Davis wrote:
 On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:
 On 19/11/2010 21:27, Jonathan M Davis wrote:

 And by providing a lexer and a parser outside the standard library,
 wouldn't it make it just as easy for those tools to be written? What's
 the advantage of being in the standard library? I see only
 disadvantages: to begin with it potentially increases the time that
 Walter or other Phobos contributors may have to spend on it, even if
 it's just reviewing patches or making sure the code works.

 If nothing, else, it makes it easier to keep in line with dmd itself. Since the
 dmd front end is LGPL, it's not possible to have a Boost port of it (like the
 Phobos version will be) without Walter's consent. And I'd be surprised if he
did
 that for a third party library (though he seems to be pretty open on a lot of
 that kind of stuff). Not to mention, Walter and the core developers are
_exactly_
 the kind of people that you want working on a lexer or parser of the language
 itself, because they're the ones who work on it.

 - Jonathan M Davis

Eh? That license argument doesn't make sense: if the lexer and parser 
were to be based on DMD itself, then putting it in the standard library 
is equivalent (in licensing terms) to licensing the lexer and parser 
parts of DMD in Boost. More correctly, what I mean by equivalent, is 
that there no reason why Walter would allow one thing and not the 
other... (because on both cases he would have to issue that license)

As for your second argument, yes, Walter and the core developers would 
be the most qualified people to work in it, no question about it. But my 
point is, I don't think Walter and Phobos core devs should be working on 
it, because it takes time away from other things that are much more 
important. Their time is precious.
I think our main point of disagreement is just how important a D lexer 
and/or parser would be. I think it would be of very low interest, 
definitely not a "major benefit to the D community".

For starters, regarding its use in IDEs: I think we are *ages* away from 
the point were an IDE based on D only will be able to compete with IDEs 
based in Eclipse/Visual-Studio/Xcode/etc.. I think much sooner we will 
have a full D compiler written in D than a (competitive) D IDE written 
in D. We barely have mature GUI libraries from what I understand.
(What may be more realistic is an IDE partially written in D, and 
otherwise based on Eclipse/Visual-Studio/etc., but even so, I think it 
would be hard to compete with other non-D IDEs)


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Todd VanderVeen <TDVanderVeen gmail.com> writes:

== Quote from Bruno Medeiros (brunodomedeiros+spam com.gmail)'s article
 I think much sooner we will
 have a full D compiler written in D than a (competitive) D IDE written
 in D.

I agree. I do like the suggestion for developing the D grammar in Antlr though
and
it is something I would be interested in working on. With this in hand, the
prospect of adding D support as was done for C++ to Eclipse or Netbeans becomes
much more feasible. Has a complete grammar been defined/compiled or is anyone
currently working in this direction? Having a robust IDE seems far more
important
than whether it is written in D itself.

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 23:45, Todd VanderVeen wrote:
 == Quote from Bruno Medeiros (brunodomedeiros+spam com.gmail)'s article
 I think much sooner we will
 have a full D compiler written in D than a (competitive) D IDE written
 in D.

 I agree. I do like the suggestion for developing the D grammar in Antlr though
and
 it is something I would be interested in working on. With this in hand, the
 prospect of adding D support as was done for C++ to Eclipse or Netbeans becomes
 much more feasible. Has a complete grammar been defined/compiled or is anyone
 currently working in this direction? Having a robust IDE seems far more
important
 than whether it is written in D itself.

See the comment I made below, to Michael Stover. ( 
news://news.digitalmars.com:119/ic71pa$1lev$1 digitalmars.com )

-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, November 19, 2010 15:17:35 Bruno Medeiros wrote:
 On 19/11/2010 22:02, Jonathan M Davis wrote:
 On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:
 On 19/11/2010 21:27, Jonathan M Davis wrote:
 
 And by providing a lexer and a parser outside the standard library,
 wouldn't it make it just as easy for those tools to be written? What's
 the advantage of being in the standard library? I see only
 disadvantages: to begin with it potentially increases the time that
 Walter or other Phobos contributors may have to spend on it, even if
 it's just reviewing patches or making sure the code works.

 
 If nothing, else, it makes it easier to keep in line with dmd itself.
 Since the dmd front end is LGPL, it's not possible to have a Boost port
 of it (like the Phobos version will be) without Walter's consent. And
 I'd be surprised if he did that for a third party library (though he
 seems to be pretty open on a lot of that kind of stuff). Not to mention,
 Walter and the core developers are _exactly_ the kind of people that you
 want working on a lexer or parser of the language itself, because
 they're the ones who work on it.
 
 - Jonathan M Davis

 
 Eh? That license argument doesn't make sense: if the lexer and parser
 were to be based on DMD itself, then putting it in the standard library
 is equivalent (in licensing terms) to licensing the lexer and parser
 parts of DMD in Boost. More correctly, what I mean by equivalent, is
 that there no reason why Walter would allow one thing and not the
 other... (because on both cases he would have to issue that license)

It's very different to have D implementation of something - which is based on a 
C++ version but definitely different in some respects - be under Boost and 
generally available, and having the C++ implementation be under Boost - 
particularly when the C++ version covers far more than just a lexer and parser. 
Someone _could_ port the D code back to C++ and have that portion useable under 
Boost, but that's a lot more work than just taking the C++ code and using it, 
and it's only the portions of the compiler which were ported to D to which
could 
be re-used that way. And since the Boost code could be used in a commercial 
product while the LGPL is more restricted, it could make a definite difference.

I'm not a licensing expert, and I'm not an expert on what Walter does and 
doesn't want done with his code, but he put the compiler front end under the 
LGPL, not Boost, and he's given his permission to have the lexer alone ported
to 
D and put under the Boost license in the standard library, which is very 
different from putting the entire front end under Boost. I expect that the
parser 
will follow eventually, but even if it does, that's still not the entire front 
end. So, there is a difference in licenses does have a real impact. And no one 
can take the LGPL C++ code and port it to D - for the standard library or 
otherwise - without Walter's permission, because its his copyright on the code.

As for the usefulness of a D lexer and parser, I've already had several
programs 
or functions which I've wanted to write which would require it, and the lack
has 
made them infeasible. For instance, I was considering posting a version of my 
datetime code without the unit tests in it, so that it would be easier to read 
the actual code (given the large number of unit tests), but I found that to 
accurately do that, you need a lexer for D, so I gave up on it for the time 
being. Having a function which stripped out unnecessary whitespace (and 
especially newlines) for string mixins would be great (particularly since line 
numbers get messed up with multi-line string mixins), but that would require  
CTFE-able D lexer to work correctly (though you might be able to hack together 
something which would mostly work), which we don't have. The D lexer won't be 
CTFE-able initially (though hopefully it will be once the CTFE capabilites of 
dmd improve), so you still won't be able to do that once the lexer is done, but 
it is a case where a lexer would be useful.


are huge. It will take time to get there, and we'll need more developers, but I 
don't think that it really makes sense to not put things in the standard
library 
because it might take more dev time - particularly when a D lexer is the sort
of 
thing that likely won't need much changing once it's done, since it would only 
need to be changed when the language changed or when a bug with it was found 
(which would likely equate to a bug in the compiler anyway), so ultimately, the 
developer cost is likely fairly low. Additionally, Walter thinks that the 
development costs will be lower to have it in the standard library with an 
implementation similar to dmd's rather than having it separate. And it's his 
call. So, it's going to get done. There are several people around here who 
lament the lack of D parser in Phobos at least periodically, and I think that
it 
will be good to have an appropriate lexer and parser for D in Phobos. Having 
other 3rd party stuff - like antlr - is great too, but that's no reason not to 
put it in the standard library.

I think that we're just going to have to agree to disagree on this one.

- Jonathan M Davis

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 20/11/2010 01:29, Jonathan M Davis wrote:
 On Friday, November 19, 2010 15:17:35 Bruno Medeiros wrote:
 On 19/11/2010 22:02, Jonathan M Davis wrote:
 On Friday, November 19, 2010 13:53:12 Bruno Medeiros wrote:
 On 19/11/2010 21:27, Jonathan M Davis wrote:

 And by providing a lexer and a parser outside the standard library,
 wouldn't it make it just as easy for those tools to be written? What's
 the advantage of being in the standard library? I see only
 disadvantages: to begin with it potentially increases the time that
 Walter or other Phobos contributors may have to spend on it, even if
 it's just reviewing patches or making sure the code works.

 If nothing, else, it makes it easier to keep in line with dmd itself.
 Since the dmd front end is LGPL, it's not possible to have a Boost port
 of it (like the Phobos version will be) without Walter's consent. And
 I'd be surprised if he did that for a third party library (though he
 seems to be pretty open on a lot of that kind of stuff). Not to mention,
 Walter and the core developers are _exactly_ the kind of people that you
 want working on a lexer or parser of the language itself, because
 they're the ones who work on it.

 - Jonathan M Davis

 Eh? That license argument doesn't make sense: if the lexer and parser
 were to be based on DMD itself, then putting it in the standard library
 is equivalent (in licensing terms) to licensing the lexer and parser
 parts of DMD in Boost. More correctly, what I mean by equivalent, is
 that there no reason why Walter would allow one thing and not the
 other... (because on both cases he would have to issue that license)

 It's very different to have D implementation of something - which is based on a
 C++ version but definitely different in some respects - be under Boost and
 generally available, and having the C++ implementation be under Boost -
 particularly when the C++ version covers far more than just a lexer and parser.
 Someone _could_ port the D code back to C++ and have that portion useable under
 Boost, but that's a lot more work than just taking the C++ code and using it,
 and it's only the portions of the compiler which were ported to D to which
could
 be re-used that way. And since the Boost code could be used in a commercial
 product while the LGPL is more restricted, it could make a definite difference.

 I'm not a licensing expert, and I'm not an expert on what Walter does and
 doesn't want done with his code, but he put the compiler front end under the
 LGPL, not Boost, and he's given his permission to have the lexer alone ported
to
 D and put under the Boost license in the standard library, which is very
 different from putting the entire front end under Boost. I expect that the
parser
 will follow eventually, but even if it does, that's still not the entire front
 end. So, there is a difference in licenses does have a real impact. And no one
 can take the LGPL C++ code and port it to D - for the standard library or
 otherwise - without Walter's permission, because its his copyright on the code.

There are some misunderstandings here. First, the DMD front-end is 
licenced under the GPL, not LGPL.
Second, more importantly, it is actually also licensed under the 
Artistic license, a very permissible license. This is the basis for me 
stating that almost certainly Walter would not mind licensing the DMD 
parser and lexer under Boost, as it's actually not that different from 
the Artistic license.



 are huge. It will take time to get there, and we'll need more developers, but I


bigger than Phobos, and yet they have no functionality for 
lexing/parsing their own languages (or any other for that matter)!


-- 
Bruno Medeiros - Software Engineer

Nov 24 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 11/19/10 1:03 PM, Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisał(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

 Agreed, of all the things desired for D, a D tokenizer would rank pretty
 low I think.

 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

Even C has strtok.

Andrei

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 23:39, Andrei Alexandrescu wrote:
 On 11/19/10 1:03 PM, Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisał(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's
 templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

 Agreed, of all the things desired for D, a D tokenizer would rank pretty
 low I think.

 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

 Even C has strtok.

 Andrei

That's just a fancy splitter, I wouldn't call that a proper tokenizer. I 
meant something that, at the very least, would tokenize based on regular 
expressions (and have heterogenous tokens).

-- 
Bruno Medeiros - Software Engineer

Nov 24 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 24/11/2010 13:30, Bruno Medeiros wrote:
 On 19/11/2010 23:39, Andrei Alexandrescu wrote:
 On 11/19/10 1:03 PM, Bruno Medeiros wrote:
 On 22/10/2010 20:48, Andrei Alexandrescu wrote:
 On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
 Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 digitalmars.com>
 napisał(a):

 As we all know, tool support is important for D's success. Making
 tools easier to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax
 highlighting filters, pretty printers, repl, doc generators, static
 analyzers, and even D compilers.

 It should:

 1. support a range interface for its input, and a range interface for
 its output
 2. optionally not generate lexical errors, but just try to recover
 and
 continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be
 maintained in tandem

 It can also serve as the basis for creating a javascript
 implementation that can be embedded into web pages for syntax
 highlighting, and eventually an std.lang.d.parse.

 Anyone want to own this?

 Interesting idea. Here's another: D will soon need bindings for CORBA,
 Thrift, etc, so lexers will have to be written all over to grok
 interface files. Perhaps a generic tokenizer which can be parametrized
 with a lexical grammar would bring more ROI, I got a hunch D's
 templates
 are strong enough to pull this off without any source code generation
 ala JavaCC. The books I read on compilers say tokenization is a solved
 problem, so the theory part on what a good abstraction should be is
 done. What you think?

 Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
 generator.

 Agreed, of all the things desired for D, a D tokenizer would rank pretty
 low I think.

 Another thing, even though a tokenizer generator would be much more
 desirable, I wonder if it is wise to have that in the standard library?
 It does not seem to be of wide enough interest to be in a standard
 library. (Out of curiosity, how many languages have such a thing in
 their standard library?)

 Even C has strtok.

 Andrei

 That's just a fancy splitter, I wouldn't call that a proper tokenizer. I
 meant something that, at the very least, would tokenize based on regular
 expressions (and have heterogenous tokens).

In other words, a lexer, that might be a better term in this context.

-- 
Bruno Medeiros - Software Engineer

Nov 24 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 As we all know, tool support is important for D's success. Making tools easier 
 to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
std.lang.d.lex. 
 It would be helpful in writing color syntax highlighting filters, pretty 
 printers, repl, doc generators, static analyzers, and even D compilers.

This is a quite long talk by Steve Yegge that I've just seen (linked from
Reddit):
http://vimeo.com/16069687

I don't suggest you to see it all unless you are very interested in that topic.
But the most important thing it says is that, given that big software companies
use several languages, and programmers often don't want to change their
preferred IDE, there is a problem: given N languages and M editors/IDEs, total
toolchain effort is N * M. That means N syntax highlighters, N indenters, N
refactoring suites, etc. Result: most languages have bad toolchains and most
IDEs manage very well only one or very few languages.

So he has suggested the Grok project, that allows to reduce the toolchain
effort to N + M. Each language needs to have one of each service: indenter,
highlighter, name resolver, refactory, etc. So each IDE may link (using a
standard interface provided by Grok) to those services and use them.

Today Grok is not available yet, and its development is at the first stages,
but after this talk I think that it may be positive to add to Phobos not just
the D lexer, but also other things, even a bit higher level as an indenter,
highlighter, name resolver, refactory, etc. Even if they don't use the standard
universal interface used by Grok I think they may speed up the development of
the D toolchain.

Bye,
bearophile

Oct 23 2010

bearophile <bearophileHUGS lycos.com> writes:

 This is a quite long talk by Steve Yegge that I've just seen (linked from
Reddit):
 http://vimeo.com/16069687

Sorry, the Reddit thread:
http://www.reddit.com/r/programming/comments/dvd9x/steve_yegge_on_scalable_programming_language/

Oct 23 2010

"Nick Sabalausky" <a a.a> writes:

"bearophile" <bearophileHUGS lycos.com> wrote in message 
news:i9vs3v$142e$1 digitalmars.com...
 Walter:

 As we all know, tool support is important for D's success. Making tools 
 easier
 to build will help with that.

 To that end, I think we need a lexer for the standard library - 
 std.lang.d.lex.
 It would be helpful in writing color syntax highlighting filters, pretty
 printers, repl, doc generators, static analyzers, and even D compilers.

 This is a quite long talk by Steve Yegge that I've just seen (linked from 
 Reddit):
 http://vimeo.com/16069687

 I don't suggest you to see it all unless you are very interested in that 
 topic. But the most important thing it says is that, given that big 
 software companies use several languages, and programmers often don't want 
 to change their preferred IDE, there is a problem: given N languages and M 
 editors/IDEs, total toolchain effort is N * M. That means N syntax 
 highlighters, N indenters, N refactoring suites, etc. Result: most 
 languages have bad toolchains and most IDEs manage very well only one or 
 very few languages.

 So he has suggested the Grok project, that allows to reduce the toolchain 
 effort to N + M. Each language needs to have one of each service: 
 indenter, highlighter, name resolver, refactory, etc. So each IDE may link 
 (using a standard interface provided by Grok) to those services and use 
 them.

 Today Grok is not available yet, and its development is at the first 
 stages, but after this talk I think that it may be positive to add to 
 Phobos not just the D lexer, but also other things, even a bit higher 
 level as an indenter, highlighter, name resolver, refactory, etc. Even if 
 they don't use the standard universal interface used by Grok I think they 
 may speed up the development of the D toolchain.

I haven't looked at the video, but that sounds like the direction I've had 
in mind for Goldie.

Oct 23 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 24/10/2010 00:46, bearophile wrote:
 Walter:

 As we all know, tool support is important for D's success. Making tools easier
 to build will help with that.

 To that end, I think we need a lexer for the standard library - std.lang.d.lex.
 It would be helpful in writing color syntax highlighting filters, pretty
 printers, repl, doc generators, static analyzers, and even D compilers.

 This is a quite long talk by Steve Yegge that I've just seen (linked from
Reddit):
 http://vimeo.com/16069687

 I don't suggest you to see it all unless you are very interested in that
topic. But the most important thing it says is that, given that big software
companies use several languages, and programmers often don't want to change
their preferred IDE, there is a problem: given N languages and M editors/IDEs,
total toolchain effort is N * M. That means N syntax highlighters, N indenters,
N refactoring suites, etc. Result: most languages have bad toolchains and most
IDEs manage very well only one or very few languages.

 So he has suggested the Grok project, that allows to reduce the toolchain
effort to N + M. Each language needs to have one of each service: indenter,
highlighter, name resolver, refactory, etc. So each IDE may link (using a
standard interface provided by Grok) to those services and use them.

 Today Grok is not available yet, and its development is at the first stages,
but after this talk I think that it may be positive to add to Phobos not just
the D lexer, but also other things, even a bit higher level as an indenter,
highlighter, name resolver, refactory, etc. Even if they don't use the standard
universal interface used by Grok I think they may speed up the development of
the D toolchain.

 Bye,
 bearophile


Hum, very interesting topic! A few disjoint comments:


(*) I'm glad to see another person, especially one who is "prominent" in 
the development community (like Andrei), discuss the importance of the 
toolchain, specificaly IDEs, for emerging languages. Or for any language 
for that matter. At the beggining of the talk I was like "man, this is 
spot-on, that's what I've said before, I wish Walter would *hear* this"! 
LOL, imagine my surprise when I found that Walter was in fact *there*! 
(When I saw the talk I didn't even know this was at NWCPP, otherwise I 
might have suspected)


(*) I actually thought about some similar ideas before, for example, I 
thought about the idea of exposing some (if not all) of the 
functionality of DDT through the command-line (note that Eclipse can run 
headless, without any UI). And this would not be just semantic/indexer 
functionality, so for example:
   * DDoc generation, like Descent had at some point 
(http://www.mail-archive.com/digitalmars-d-announce puremagic.com/msg02734.html)
   * build functionality - only really interesting if the DDT builder 
becomes smarter, ie, does more useful stuff than what it does now.
   * semantic functionality: find-ref, code completion.


(*) I wished I was at that talk, I would have liked to ask and discuss 
some things with Steve Yegge, particularly his comments about Eclipse's 
indexer. I become curious for details about what he thinks is wrong 
about Eclipse's indexer. Also, I wonder if he's not conflating "CDT's 
indexer" with "Eclipse indexer", because actually there is no such thing 
as a "Eclipse indexer". I'm gonna take a better look at the comments for 
this one.


(*) As for Grok itself, it looks potentially interesting, but I still 
have only a very vague impression of what it does (let alone *how*).


-- 
Bruno Medeiros - Software Engineer

Nov 24 2010

Andrew Wiley <debio264 gmail.com> writes:

 On 24/10/2010 00:46, bearophile wrote:

 Walter:


  As we all know, tool support is important for D's success. Making tools
 easier
 to build will help with that.

 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex.
 It would be helpful in writing color syntax highlighting filters, pretty
 printers, repl, doc generators, static analyzers, and even D compilers.

 This is a quite long talk by Steve Yegge that I've just seen (linked from
 Reddit):
 http://vimeo.com/16069687

 I don't suggest you to see it all unless you are very interested in that
 topic. But the most important thing it says is that, given that big software
 companies use several languages, and programmers often don't want to change
 their preferred IDE, there is a problem: given N languages and M
 editors/IDEs, total toolchain effort is N * M. That means N syntax
 highlighters, N indenters, N refactoring suites, etc. Result: most languages
 have bad toolchains and most IDEs manage very well only one or very few
 languages.

 So he has suggested the Grok project, that allows to reduce the toolchain
 effort to N + M. Each language needs to have one of each service: indenter,
 highlighter, name resolver, refactory, etc. So each IDE may link (using a
 standard interface provided by Grok) to those services and use them.

 Today Grok is not available yet, and its development is at the first
 stages, but after this talk I think that it may be positive to add to Phobos
 not just the D lexer, but also other things, even a bit higher level as an
 indenter, highlighter, name resolver, refactory, etc. Even if they don't use
 the standard universal interface used by Grok I think they may speed up the
 development of the D toolchain.

 Bye,
 bearophile

From watching this, I'm reminded that in the Scala world, the compiler can

be used in this way. The Eclipse plugin for Scala (and I assume the Netbeans
and IDEA plugins work similarly) is really just a wrapper around the
compiler because the compiler can be used as a library, allowing a rich IDE
with minimal effort because rather than implementing parsing and semantic
analysis, the IDE team can just query the compiler's data structures.

Nov 24 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 24/11/2010 18:48, Andrew Wiley wrote:
     On 24/10/2010 00:46, bearophile wrote:

         Walter:


             As we all know, tool support is important for D's success.
             Making tools easier
             to build will help with that.

             To that end, I think we need a lexer for the standard
             library - std.lang.d.lex.
             It would be helpful in writing color syntax highlighting
             filters, pretty
             printers, repl, doc generators, static analyzers, and even D
             compilers.


         This is a quite long talk by Steve Yegge that I've just seen
         (linked from Reddit):
         http://vimeo.com/16069687

         I don't suggest you to see it all unless you are very interested
         in that topic. But the most important thing it says is that,
         given that big software companies use several languages, and
         programmers often don't want to change their preferred IDE,
         there is a problem: given N languages and M editors/IDEs, total
         toolchain effort is N * M. That means N syntax highlighters, N
         indenters, N refactoring suites, etc. Result: most languages
         have bad toolchains and most IDEs manage very well only one or
         very few languages.

         So he has suggested the Grok project, that allows to reduce the
         toolchain effort to N + M. Each language needs to have one of
         each service: indenter, highlighter, name resolver, refactory,
         etc. So each IDE may link (using a standard interface provided
         by Grok) to those services and use them.

         Today Grok is not available yet, and its development is at the
         first stages, but after this talk I think that it may be
         positive to add to Phobos not just the D lexer, but also other
         things, even a bit higher level as an indenter, highlighter,
         name resolver, refactory, etc. Even if they don't use the
         standard universal interface used by Grok I think they may speed
         up the development of the D toolchain.

         Bye,
         bearophile


  From watching this, I'm reminded that in the Scala world, the compiler
 can be used in this way. The Eclipse plugin for Scala (and I assume the
 Netbeans and IDEA plugins work similarly) is really just a wrapper
 around the compiler because the compiler can be used as a library,
 allowing a rich IDE with minimal effort because rather than implementing
 parsing and semantic analysis, the IDE team can just query the
 compiler's data structures.

Interesting, very wise of them to do that.
But not very surprising, Scala is close to the Java world, so they (the 
Scala people) must have known how important it would be to have the best 
toolchain possible, in order to compete (with Java, JDT, also Visual 
Studio, etc.).

-- 
Bruno Medeiros - Software Engineer

Nov 25 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:i9qd8q$1ls4$1 digitalmars.com...
 4. the tokens should be a value type, not a reference type

I'm curious, is your reason for this purely to avoid allocations during 
lexing, or are there other reasons too?

If it's mainly to avoid allocations during lexing then, maybe I've 
understood wrong, but isn't D2 getting the ability to construct class 
objects in-place into pre-allocated memory? (or already has the ability?) If 
so, do you think just creating the tokens that way would likely be close 
enough?

Oct 26 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:i9qd8q$1ls4$1 digitalmars.com...
 4. the tokens should be a value type, not a reference type

 
 I'm curious, is your reason for this purely to avoid allocations during 
 lexing, or are there other reasons too?

It's one big giant reason. Storage allocation gets unbelievably costly in a 
lexer. Another is it makes tokens easy to copy. Another one is that classes are 
for polymorphic behavior. What kind of polymorphic behavior would one want with 
tokens?


 If it's mainly to avoid allocations during lexing then, maybe I've 
 understood wrong, but isn't D2 getting the ability to construct class 
 objects in-place into pre-allocated memory?

If you do that, might as well make them value types. The only reason classes 
exist is to support runtime polymorphism.


C++ made a vast mistake in failing to distinguish between value types and 
reference types. Java made a related mistake by failing to acknowledge that 
value types have any useful purpose at all (unless they are built-in).

Oct 26 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 Java made a related mistake by failing to acknowledge that 
 value types have any useful purpose at all (unless they are built-in).

Java was designed to be simple! Simple means to have a more uniform semantics.
Removing value types was a good idea if you want to simplify a language (and
remove a mountain of details from C++). And from the huge success of Java, I

a more complex than Java). The Java VM also is now often able to allocate not
escaping objects on the stack (escape analysis) regaining some of the lost
performance.

What I miss more in Java is not single structs (single values), but a way to
build an array of values (structs). Because using parallel arrays is not nice
at all. Even in Python using numPy you may create an array of structs (compound
value items).

Bye,
bearophile

Oct 26 2010

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 Walter:
 
 Java made a related mistake by failing to acknowledge that value types have
 any useful purpose at all (unless they are built-in).

 
 Java was designed to be simple! Simple means to have a more uniform
 semantics.

So was Pascal. See the thread about how useless it was as a result.

A hatchet is a very simple tool, easy to understand, and I could build a house 
with nothing but a hatchet. But it would make the house several times as 
expensive to build, and it would look like it was built with a hatchet.


 Removing value types was a good idea if you want to simplify a
 language (and remove a mountain of details from C++). And from the huge


 often able to allocate not escaping objects on the stack (escape analysis)
 regaining some of the lost performance.

The issue isn't just about lost performance. It's about proper encapsulation of 
a type. Value types and polymorphic types are different, have different 
purposes, different behaviors, etc. Conflating the two into the same construct 
makes for poor and confusing abstractions.

It shifts the problem out of the language and onto the programmer. It does NOT 
make the complexity go away.


 What I miss more in Java is not single structs (single values),

There's a lot more to miss than that. I find Java code tends to be excessively 
complex, and that's because it lacks expressive power. It was summed up for me 
by a colleague who said that one needs an IDE to program in Java because with 
one button it will auto-generate 100 lines of boilerplate.

Oct 26 2010

retard <re tard.com.invalid> writes:

Tue, 26 Oct 2010 21:39:32 -0700, Walter Bright wrote:

 bearophile wrote:
 Walter:
 
 Java made a related mistake by failing to acknowledge that value types
 have any useful purpose at all (unless they are built-in).

 
 Java was designed to be simple! Simple means to have a more uniform
 semantics.

 
 So was Pascal. See the thread about how useless it was as a result.

Blablabla.. this nostalgic lesson reminded me, have you even started 
studying the list of type system concepts I listed few days ago. A new 
version with links is coming at some point of time.

 What I miss more in Java is not single structs (single values),

 
 There's a lot more to miss than that. I find Java code tends to be
 excessively complex, and that's because it lacks expressive power.

Adding structs to Java wouldn't fix that. You probably know that. 
Unifying structs and classes in a language like D and adding good escape 
analysis wouldn't worsen the performance that badly in general purpose 
applications. Java is mostly used for general purpose programming so your 
claims about usefulness and the need for extreme performance look silly.

Oct 27 2010

Walter Bright <newshound2 digitalmars.com> writes:

retard wrote:
 have you even started
 studying the list of type system concepts I listed few days ago.

Java has proved that such things aren't useful in programming languages :-)


 Adding structs to Java wouldn't fix that.  You probably know that.
 Unifying structs and classes in a language like D and adding good escape 
 analysis wouldn't worsen the performance that badly in general purpose 
 applications. Java is mostly used for general purpose programming so your 
 claims about usefulness and the need for extreme performance look silly.


If that were true, why are Java char/int/double types value types, not a 
reference type derived from Object?

Oct 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 So was Pascal. See the thread about how useless it was as a result.

But Java is probably currently the most used language, so I guess they have
created a simpler language, but not too much simple as Pascal was.


 Value types and polymorphic types are different, have different

purposes, different behaviors, etc.

Right.


Conflating the two into the same construct makes for poor and confusing
abstractions.<

In Python there are (more or less) only objects, and they are managed "by name"
(similar a "by reference") and it works well enough.


It shifts the problem out of the language and onto the programmer. It does NOT
make the complexity go away.<

This is partially true. The presence of just objects doesn't solve all
problems, so part of the complexity doesn't go away, it goes into the program.
On the other hand value semantics introduces a large amount of complexity by
itself (in C++ there is a huge amount of stuff about this semantics, and even
in D the design is unfinished still after ten years and after all the
experience with C++).

So in my opinion in the end the net result is that removing structs makes the
language+programs simpler.


 There's a lot more to miss than that. I find Java code tends to be excessively
 complex, and that's because it lacks expressive power. It was summed up for me
 by a colleague who said that one needs an IDE to program in Java because with
 one button it will auto-generate 100 lines of boilerplate.

Yes, clearly Java has several faults. It's far from perfect. But introducing
structs inside Java is in my opinion not going to solve those problems much.



 [...] If that were true, why are Java char/int/double types value types,
 not a reference type derived from Object?

For performance reasons, because originally Java didn't have the advanced
compilation strategies used today. Languages like Clojure that run on the
JavaVM use more reference types (for integer numbers too). 


After all this discussion I want to remind you that I am here because I like D
and I like D structs, unions and all that :-) I prefer to use D many times over
Java. And I agree that structs (or tagged unions) are better in D for the lexer
if you want the lexer to be quite fast.

Bye,
bearophile

Oct 27 2010

Walter Bright <newshound2 digitalmars.com> writes:

bearophile wrote:
 After all this discussion I want to remind you that I am here because I like
 D and I like D structs, unions and all that :-) I prefer to use D many times
 over Java. And I agree that structs (or tagged unions) are better in D for
 the lexer if you want the lexer to be quite fast.

So, there is "value" in value types after all. I confess I have no idea why you 
argue against them.

Oct 27 2010

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 So, there is "value" in value types after all. I confess I have no idea why
you 
 argue against them.

I am not arguing against them in absolute. They are good in some situations and
not so good in other situations :-)

Compound value types are very useful in a certain imperative low-level
language. While if you are designing a simpler language, it's better to not add
structs to it (and yes, in practice the world needs simpler languages too, not
everyone needs a Ferrari in every moment. And I believe that removing structs
makes on average simpler the sum of the language+its programs).

Bye,
bearophile

Oct 27 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 27/10/2010 05:39, Walter Bright wrote:
 What I miss more in Java is not single structs (single values),

 There's a lot more to miss than that. I find Java code tends to be
 excessively complex, and that's because it lacks expressive power. It
 was summed up for me by a colleague who said that one needs an IDE to
 program in Java because with one button it will auto-generate 100 lines
 of boilerplate.

I've been hearing that a lot, but I find this to be excessively 
exaggerated. Can you give some concrete examples?

Because regarding excessive verbosity in Java, I cab only remember tree 
significant things at the moment (at least disregarding meta 
programming), and one of them is nearly as verbose in D as in Java:

  1) writing getters and setters for fields
  2) verbose syntax for closures. (need to use an anonymous class, outer 
variables must be final, and wrapped in an array if write access is needed)
  3) writing trivial constructors whose parameters mirror the fields, 
and then constructors assign the parameters to the fields.

I don't think 1 and 2 happens that often to be that much of an 
annoyance. (unless you're one of those Java persons that thinks that 
directly accessing the public field of another class is a sin, instead 
every single field must have getters/setters and never ever be public...)

As an additional note, I don't think having an IDE auto-generate X lines 
of boilerplate code is necessarily a bad thing. It's only bad if the 
alternative of having a better language feature would actually save me 
coding time (whether initial coding, or subsequent modifications) or 
improve code understanding. _Isn't this what matters?_


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 27/10/2010 05:39, Walter Bright wrote:
 bearophile wrote:
 Walter:
 Java was designed to be simple! Simple means to have a more uniform
 semantics.

 So was Pascal. See the thread about how useless it was as a result.

There's good simple, and there's bad simple...


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

"Nick Sabalausky" <a a.a> writes:

"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:ia8321$vug$1 digitalmars.com...
 Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message 
 news:i9qd8q$1ls4$1 digitalmars.com...
 4. the tokens should be a value type, not a reference type

 I'm curious, is your reason for this purely to avoid allocations during 
 lexing, or are there other reasons too?

 It's one big giant reason. Storage allocation gets unbelievably costly in 
 a lexer. Another is it makes tokens easy to copy. Another one is that 
 classes are for polymorphic behavior. What kind of polymorphic behavior 
 would one want with tokens?

Honestly, I'm not entirely certain whether or not Goldie actually needs its 
tokens to be classes instead of structs, but I'll explain my current usage:

In the basic "dynamic" style, every token in Goldie, terminal or 
nonterminal, is of type "class Token" no matter what its symbol or part of 
grammar it represents. But Goldie has an optional "static" style which 
creates a class hierarchy, for the sake of compile-time type safety, with 
"Token" as the base.

Suppose you have a grammar named "simple" that's a series of one or more a's 
and b's separated by plus signs:

<Item> ::= 'a' | 'b'
<List> ::= <List> '+' <Item> | <Item>

So there's three terminals: "a", "b", and "+"
And two nonterminals: "<Item>" and "<List>", each with exactly two possible 
reductions.

So in dynamic-style, all of those are "class Token", and that's all that 
exists. But with the optional static-style, the following class hierarchy is 
effectively created:

class Token;
class Token_simple : Token;

class Token_simple!"a" : Token_simple;
class Token_simple!"b" : Token_simple;
class Token_simple!"+" : Token_simple;
class Token_simple!"<Item>" : Token_simple;
class Token_simple!"<List>" : Token_simple;

class Token_simple!("<Item>", "a") : Token_simple!"<Item>";
class Token_simple!("<Item>", "b") : Token_simple!"<Item>";

class Token_simple!("<List>", "<List>", "+", "<Item>") : 
Token_simple!"<List>";
class Token_simple!("<List>", "<Item>") : Token_simple!"<List>";

So rules inherit from the nonterminal they reduce to; terminals and 
nonterminals inherit from a dummy class dedicated specifically to the given 
grammar; and that inherits from plain old dynamic-style "Token". All of 
those template parameters are validated at compile-time.

(At some point I'd like to make it possible to specify the rule-based tokens 
as something like:
Token!("<List> ::= <List> '+' <Item>"), but I haven't gotten to it yet.)

There's one more trick: The plain old Token exposes a member "subX" which 
can be numerically indexed to obtain the sub-tokens (for terminals, 
subX.length==0):

void foo(Token token)
{
    if(token.matches("<List>", "<List>", "+", "<Item>"))
    {
        auto leftSide = token.subX[0];
        auto rightSide = token.subX[2];
        //auto dummy = token.subX[10]; // Run-time error

        static assert(is(typeof(leftSide) == Token));
        static assert(is(typeof(rightSide) == Token));
    }
}

Note that it's impossible for the "matches" function to verify at 
compile-time that its arguments are valid.

All of the static-style token types retain all of that for total 
compatibility with the dynamic-style. But the static-style nonterminals 
provide an additional member, "sub", that can be used like this:

void foo(Token_simple!("<List>", "<List>", "+", "<Item>") token)
{
    auto leftSide = token.sub!0;
    auto rightSide = token.sub!2;
    //auto dummy = token.sub!10; // Compile-time error

    static assert(is(typeof(leftSide) == Token_simple!"<List>"));
    static assert(is(typeof(rightSide) == Token_simple!"<Item>"));
}

As for whether or not this effect can be reasonably accomplished with 
structs: I have no idea, I haven't really looked into it.

Oct 26 2010

Walter Bright <newshound2 digitalmars.com> writes:

Nick Sabalausky wrote:
 As for whether or not this effect can be reasonably accomplished with 
 structs: I have no idea, I haven't really looked into it.

I use a tagged variant for the token struct.

This doesn't make any difference if one is parsing small pieces of code. But 
when you're trying to stuff millions of lines of code down its maw, avoiding an 
allocation per token is a big deal.

The indirect calls to the member functions of a class also perform poorly 
relative to tagged variants.

Oct 26 2010

retard <re tard.com.invalid> writes:

Tue, 26 Oct 2010 19:32:44 -0700, Walter Bright wrote:

 Nick Sabalausky wrote:
 "Walter Bright" <newshound2 digitalmars.com> wrote in message
 news:i9qd8q$1ls4$1 digitalmars.com...
 4. the tokens should be a value type, not a reference type

 
 I'm curious, is your reason for this purely to avoid allocations during
 lexing, or are there other reasons too?

 
 It's one big giant reason. Storage allocation gets unbelievably costly
 in a lexer. Another is it makes tokens easy to copy. Another one is that
 classes are for polymorphic behavior. What kind of polymorphic behavior
 would one want with tokens?

This is why the basic data structure in functional languages, algebraic 
data types, suits better for this purpose.

Oct 27 2010

Walter Bright <newshound2 digitalmars.com> writes:

retard wrote:
 This is why the basic data structure in functional languages, algebraic 
 data types, suits better for this purpose.

I think you recently demonstrated otherwise, as proven by the widespread use of 
Java :-)

Oct 27 2010

retard <re tard.com.invalid> writes:

Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:

 retard wrote:
 This is why the basic data structure in functional languages, algebraic
 data types, suits better for this purpose.

 
 I think you recently demonstrated otherwise, as proven by the widespread
 use of Java :-)

I don't understand your logic -- Widespread use of Java proves that 
algebraic data types aren't a better suited way for expressing compiler's 
data structures such as syntax trees?

Oct 27 2010

Walter Bright <newshound2 digitalmars.com> writes:

retard wrote:
 Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:
 
 retard wrote:
 This is why the basic data structure in functional languages, algebraic
 data types, suits better for this purpose.

 I think you recently demonstrated otherwise, as proven by the widespread
 use of Java :-)

 
 I don't understand your logic -- Widespread use of Java proves that 
 algebraic data types aren't a better suited way for expressing compiler's 
 data structures such as syntax trees?

You told me that widespread use of Java proved that nothing more complex than 
what Java provides is useful:

"Java is mostly used for general purpose programming so your claims about 
usefulness and the need for extreme performance look silly."

I'd be surprised if you seriously meant that, as it implies that Java is the 
pinnacle of computer language design, but I can't resist teasing you about it.
:-)

Oct 27 2010

retard <re tard.com.invalid> writes:

Wed, 27 Oct 2010 13:52:29 -0700, Walter Bright wrote:

 retard wrote:
 Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:
 
 retard wrote:
 This is why the basic data structure in functional languages,
 algebraic data types, suits better for this purpose.

 I think you recently demonstrated otherwise, as proven by the
 widespread use of Java :-)

 
 I don't understand your logic -- Widespread use of Java proves that
 algebraic data types aren't a better suited way for expressing
 compiler's data structures such as syntax trees?

 
 You told me that widespread use of Java proved that nothing more complex
 than what Java provides is useful:
 
 "Java is mostly used for general purpose programming so your claims
 about usefulness and the need for extreme performance look silly."
 
 I'd be surprised if you seriously meant that, as it implies that Java is
 the pinnacle of computer language design, but I can't resist teasing you
 about it. :-)

I only meant that the widespead adoption of Java shows how the public at 
large cares very little about the performance issues you mentioned. Java 
is one of the most widely used languages and it's also successful in many 
fields. Things could be better from programming language theory's point 
of view, but the business world is more interesting in profits and the 
large pool of Java coders has given better benefits than more expressive 
languages. I don't think that says anything against my notes about 
algebraic data types.

Oct 27 2010

Walter Bright <newshound2 digitalmars.com> writes:

retard wrote:
 I only meant that the widespead adoption of Java shows how the public at 
 large cares very little about the performance issues you mentioned  Java
 is one of the most widely used languages and it's also successful in many 
 fields. Things could be better from programming language theory's point 
 of view, but the business world is more interesting in profits and the 
 large pool of Java coders has given better benefits than more expressive 
 languages. I don't think that says anything against my notes about 
 algebraic data types.


Choice of a language has numerous factors, so you cannot dismiss one factor 
because the other factors still make it an attractive choice.

For example:

"the widespread adoption of horses shows how the public at large cares very 
little about the cars you mentioned."

Oct 27 2010

retard <re tard.com.invalid> writes:

Wed, 27 Oct 2010 14:15:04 -0700, Walter Bright wrote:

 retard wrote:
 I only meant that the widespead adoption of Java shows how the public
 at large cares very little about the performance issues you mentioned 
 Java is one of the most widely used languages and it's also successful
 in many fields. Things could be better from programming language
 theory's point of view, but the business world is more interesting in
 profits and the large pool of Java coders has given better benefits
 than more expressive languages. I don't think that says anything
 against my notes about algebraic data types.

 
 
 Choice of a language has numerous factors

I know that.

, so you cannot dismiss one
 factor because the other factors still make it an attractive choice.

I don't think I said anything that contradicts that.

 For example:
 
 "the widespread adoption of horses shows how the public at large cares
 very little about the cars you mentioned."

I meant caring in a way that results in masses of programmers migrating 
their code from Java to a language with those performance issues solved 
(e.g. D). A layman can make general remarks from people switching from 
Java/C++/C to "new" languages such as Groovy, Javascript, Python, PHP, 
and Ruby. The people want "simpler" languages. For example Ruby has 
terrible performance, but the performance becomes a non-issue once the 
web service framework is built in a scalable way.

Oct 27 2010

"Nick Sabalausky" <a a.a> writes:

"retard" <re tard.com.invalid> wrote in message 
news:iaa44v$17sf$2 digitalmars.com...
 I only meant that the widespead adoption of Java shows how the public at
 large cares very little about the performance issues you mentioned.

The public at large is convinced that "Java is fast now, really!". So I'm 
not certain widespread adoption of Java necessarily indicates they don't 
care so much about performance. Of course, Java is quickly becoming a legacy 
language anyway (the next COBOL, IMO), so that throws another wrench into 
the works.

Oct 27 2010

"Todd D. VanderVeen" <tdv part.net> writes:

Legacy in the sense that C is perhaps.

http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html



-----Original Message-----
From: digitalmars-d-bounces puremagic.com
[mailto:digitalmars-d-bounces puremagic.com] On Behalf Of Nick Sabalausky
Sent: Wednesday, October 27, 2010 3:43 PM
To: digitalmars-d puremagic.com
Subject: Re: Looking for champion - std.lang.d.lex

"retard" <re tard.com.invalid> wrote in message 
news:iaa44v$17sf$2 digitalmars.com...
 I only meant that the widespead adoption of Java shows how the public at
 large cares very little about the performance issues you mentioned.

The public at large is convinced that "Java is fast now, really!". So I'm 
not certain widespread adoption of Java necessarily indicates they don't 
care so much about performance. Of course, Java is quickly becoming a legacy

language anyway (the next COBOL, IMO), so that throws another wrench into 
the works.

Oct 27 2010

retard <re tard.com.invalid> writes:

Wed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:

 Legacy in the sense that C is perhaps.
 
 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Probably the top 10 names are more or less correct there, but some funny 
notes:

33. D
36. Scratch
40. Haskell
42. JavaFX Script
49. Scala

Scratch is an educational tool. It isn't really suitable for any real 
world applications. It slows down considerably with too many expressions.

There are several books about Haskell and Scala. Both have several books 
on them, active mailing lists, and also very many active community 
projects. Haven't heard much about JavaFX outside Sun/Oracle. These 
statistics look really weird.

Oct 27 2010

Don <nospam nospam.com> writes:

retard wrote:
 Wed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:
 
 Legacy in the sense that C is perhaps.

 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

 
 Probably the top 10 names are more or less correct there, but some funny 
 notes:
 
 33. D
 36. Scratch
 40. Haskell
 42. JavaFX Script
 49. Scala
 
 Scratch is an educational tool. It isn't really suitable for any real 
 world applications. It slows down considerably with too many expressions.
 
 There are several books about Haskell and Scala. Both have several books 
 on them, active mailing lists, and also very many active community 
 projects. Haven't heard much about JavaFX outside Sun/Oracle. These 
 statistics look really weird.

I reckon Fortran is the one to look at it. If Tiobe's stats were 
sensible, the Fortran numbers would be solid as a rock.
And ADA ought to be pretty stable too. But look at this:

http://www.tiobe.com/index.php/paperinfo/tpci/Ada.html

Laughable.

Oct 28 2010

Matthias Pleh <sufu alter.de> writes:

Am 28.10.2010 16:46, schrieb Don:
 retard wrote:
 Wed, 27 Oct 2010 16:04:34 -0600, Todd D. VanderVeen wrote:

 Legacy in the sense that C is perhaps.

 http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

 Probably the top 10 names are more or less correct there, but some
 funny notes:

 33. D
 36. Scratch
 40. Haskell
 42. JavaFX Script
 49. Scala

 Scratch is an educational tool. It isn't really suitable for any real
 world applications. It slows down considerably with too many expressions.

 There are several books about Haskell and Scala. Both have several
 books on them, active mailing lists, and also very many active
 community projects. Haven't heard much about JavaFX outside
 Sun/Oracle. These statistics look really weird.

 I reckon Fortran is the one to look at it. If Tiobe's stats were
 sensible, the Fortran numbers would be solid as a rock.
 And ADA ought to be pretty stable too. But look at this:

 http://www.tiobe.com/index.php/paperinfo/tpci/Ada.html

 Laughable.

There was an article in the Ct-Magazin (German) where they took a closer 
look at this rankings.
For example:
- search for 'C'    you got 3080 M
- search for 'Java' you got  167 M

'Java' only competes with the island Java
'C'    competes with C&A, c't-Magazin, C-Quadrat, C+C, char 'c', ...
        and many many more ....

So, to correct this, only the first *100* (hundred) results are reviewed 
and the resulting factor applied to the sum of results.
Just look at 1-100, then at 101-200, 201-300 and so on ..
You get complete different factors.

So this numbers at tiobe are really lying!!!!!!

source:
http://www.heise.de/developer/artikel/Traue-keiner-Statistik-993137.html

greets
Matthias

Oct 28 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 27/10/2010 22:43, Nick Sabalausky wrote:
 "retard"<re tard.com.invalid>  wrote in message
 news:iaa44v$17sf$2 digitalmars.com...
 I only meant that the widespead adoption of Java shows how the public at
 large cares very little about the performance issues you mentioned.

 The public at large is convinced that "Java is fast now, really!". So I'm
 not certain widespread adoption of Java necessarily indicates they don't
 care so much about performance. Of course, Java is quickly becoming a legacy
 language anyway (the next COBOL, IMO), so that throws another wrench into
 the works.

Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
Just two years ago, the now hugely popular Android platform choose Java 
as it's language of choice, and you think Java is becoming legacy?...

The development of the Java language itself has stagnated over the last 
6 years or so (especially due to corporate politics, which now has 
become even worse and uncertain with all the shit Oracle is doing), but 
that's a completely different statement from saying Java is becoming 
legacy.
In fact, all the uproar and concern about the future of Java under 
Oracle, of the JVM, of the JCP (the body that regulates changes to 
Java),etc., is a testament to the huge popularity of Java. Otherwise 
people (and corporations) wouldn't care, they would just let it wither 
away with much less concern.


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

bearophile <bearophileHUGS lycos.com> writes:

Bruno Medeiros:

 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
 Just two years ago, the now hugely popular Android platform choose Java 
 as it's language of choice, and you think Java is becoming legacy?...

Java on Adroid is not going well, there is a Oracle->Google lawsuit in
progress. Google is interested in using a variant of NaCL on Android too.

Bye,
bearophile

Nov 19 2010

Andrew Wiley <debio264 gmail.com> writes:

On Fri, Nov 19, 2010 at 4:20 PM, bearophile <bearophileHUGS lycos.com>wrote:

 Bruno Medeiros:

 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
 Just two years ago, the now hugely popular Android platform choose Java
 as it's language of choice, and you think Java is becoming legacy?...

 Java on Adroid is not going well, there is a Oracle->Google lawsuit in
 progress. Google is interested in using a variant of NaCL on Android too.

I have to agree with Bruno here, Java isn't going anywhere soon. It has an
active community, corporations that very actively support it, and an open
source effort that's probably the largest of any language (check out the
Apache Foundation project lists). Toss in Clojure, Scala, Groovy, and their
friends that can build on top of Java libraries, and you wind up with a
package that isn't becoming obsolete any time soon.

Nov 19 2010

"Nick Sabalausky" <a a.a> writes:

"Andrew Wiley" <debio264 gmail.com> wrote in message 
news:mailman.501.1290205603.21107.digitalmars-d puremagic.com...
 On Fri, Nov 19, 2010 at 4:20 PM, bearophile 
 <bearophileHUGS lycos.com>wrote:

 Bruno Medeiros:

 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
 Just two years ago, the now hugely popular Android platform choose Java
 as it's language of choice, and you think Java is becoming legacy?...

 Java on Adroid is not going well, there is a Oracle->Google lawsuit in
 progress. Google is interested in using a variant of NaCL on Android too.

 I have to agree with Bruno here, Java isn't going anywhere soon. It has an
 active community, corporations that very actively support it, and an open
 source effort that's probably the largest of any language (check out the
 Apache Foundation project lists). Toss in Clojure, Scala, Groovy, and 
 their
 friends that can build on top of Java libraries, and you wind up with a
 package that isn't becoming obsolete any time soon.

To be clear, I meant Java the language, not Java the VM. But yea, you're 
right, I probably overstated my point.

What I have noticed though is, like Bruno said, a slowdown in Java language 
development and I can certainly imagine complications from the Oracle 
takeover of Sun. I've also been noticing decreasing interest in using Java 
(the language) for new projects (although, yes, not *completely* stalled 
interest) compared to 5-10 years ago, sharply increased awareness and 
recognition of Java's drawbacks compared to 5-10 years ago, and greatly 
increased interest and usage of other JVM languages besides Java.

Ten years from now, I wouldn't at all be surprised to see Java (the 
language) being used primarily for maintenance on existing software that had 
already been written in Java. In fact, I'd be surprised if that doesn't end 
up being the case at that point. But I do imagine seeing things like D, 
Nemerle, Scala and Python thriving at that point.

Nov 23 2010

Michael Stover <michael.r.stover gmail.com> writes:

As for D lexers and tokenizers, what would be nice is to
A) build an antlr grammar for D
B) build D targets for antlr so that antlr can generate lexers and parsers
in the D language.

For B) I found http://www.mbutscher.de/antlrd/index.html

For A) A good list of antlr grammars is at http://www.antlr.org/grammar/list,
but there isn't a D grammar.

These things wouldn't be an enormous amount of work to create and maintain,
and, if done, anyone could parse D code in many languages, including Java
and C which would make providing IDE features for D development easier in
those languages (eclipse for instance), and you could build lexers and
parsers in D using antlr grammars.

-Mike



On Fri, Nov 19, 2010 at 5:09 PM, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

 On 27/10/2010 22:43, Nick Sabalausky wrote:

 "retard"<re tard.com.invalid>  wrote in message
 news:iaa44v$17sf$2 digitalmars.com...

 I only meant that the widespead adoption of Java shows how the public at
 large cares very little about the performance issues you mentioned.

 The public at large is convinced that "Java is fast now, really!". So I'm
 not certain widespread adoption of Java necessarily indicates they don't
 care so much about performance. Of course, Java is quickly becoming a
 legacy
 language anyway (the next COBOL, IMO), so that throws another wrench into
 the works.

 Java is quickly becoming a legacy language? the next COBOL? SRSLY?...
 Just two years ago, the now hugely popular Android platform choose Java as
 it's language of choice, and you think Java is becoming legacy?...

 The development of the Java language itself has stagnated over the last 6
 years or so (especially due to corporate politics, which now has become even
 worse and uncertain with all the shit Oracle is doing), but that's a
 completely different statement from saying Java is becoming legacy.
 In fact, all the uproar and concern about the future of Java under Oracle,
 of the JVM, of the JCP (the body that regulates changes to Java),etc., is a
 testament to the huge popularity of Java. Otherwise people (and
 corporations) wouldn't care, they would just let it wither away with much
 less concern.


 --
 Bruno Medeiros - Software Engineer

Nov 19 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 22:25, Michael Stover wrote:
 As for D lexers and tokenizers, what would be nice is to
 A) build an antlr grammar for D
 B) build D targets for antlr so that antlr can generate lexers and
 parsers in the D language.

 For B) I found http://www.mbutscher.de/antlrd/index.html

 For A) A good list of antlr grammars is at
 http://www.antlr.org/grammar/list, but there isn't a D grammar.

 These things wouldn't be an enormous amount of work to create and
 maintain, and, if done, anyone could parse D code in many languages,
 including Java and C which would make providing IDE features for D
 development easier in those languages (eclipse for instance), and you
 could build lexers and parsers in D using antlr grammars.

 -Mike

Yes, that would be much better. It would be directly and immediately 
useful for the DDT project:

"But better yet would be to start coding our own custom parser (using a 
parser generator like ANTLR for example), that could really be tailored 
for IDE needs. In the medium/long term, that's probably what needs to be 
done. "
in 
http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html

-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

Michael Stover <michael.r.stover gmail.com> writes:

so that was 4 months ago - how do things currently stand on that initiative?

-Mike

On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

 On 19/11/2010 22:25, Michael Stover wrote:

 As for D lexers and tokenizers, what would be nice is to
 A) build an antlr grammar for D
 B) build D targets for antlr so that antlr can generate lexers and
 parsers in the D language.

 For B) I found http://www.mbutscher.de/antlrd/index.html

 For A) A good list of antlr grammars is at
 http://www.antlr.org/grammar/list, but there isn't a D grammar.

 These things wouldn't be an enormous amount of work to create and
 maintain, and, if done, anyone could parse D code in many languages,
 including Java and C which would make providing IDE features for D
 development easier in those languages (eclipse for instance), and you
 could build lexers and parsers in D using antlr grammars.

 -Mike

 Yes, that would be much better. It would be directly and immediately useful
 for the DDT project:

 "But better yet would be to start coding our own custom parser (using a
 parser generator like ANTLR for example), that could really be tailored for
 IDE needs. In the medium/long term, that's probably what needs to be done. "
 in
 http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html

 --
 Bruno Medeiros - Software Engineer

Nov 19 2010

Matthias Pleh <gonzo web.at> writes:

Am 20.11.2010 00:56, schrieb Michael Stover:
 so that was 4 months ago - how do things currently stand on that initiative?

 -Mike

 On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros
 <brunodomedeiros+spam com.gmail> wrote:

     On 19/11/2010 22:25, Michael Stover wrote:

         As for D lexers and tokenizers, what would be nice is to
         A) build an antlr grammar for D
         B) build D targets for antlr so that antlr can generate lexers and
         parsers in the D language.

         For B) I found http://www.mbutscher.de/antlrd/index.html

         For A) A good list of antlr grammars is at
         http://www.antlr.org/grammar/list, but there isn't a D grammar.

         These things wouldn't be an enormous amount of work to create and
         maintain, and, if done, anyone could parse D code in many languages,
         including Java and C which would make providing IDE features for D
         development easier in those languages (eclipse for instance),
         and you
         could build lexers and parsers in D using antlr grammars.

         -Mike


     Yes, that would be much better. It would be directly and immediately
     useful for the DDT project:

     "But better yet would be to start coding our own custom parser
     (using a parser generator like ANTLR for example), that could really
     be tailored for IDE needs. In the medium/long term, that's probably
     what needs to be done. "
     in
     http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html

     --
     Bruno Medeiros - Software Engineer

There is a project with an antlr D-grammar in work.
http://code.google.com/p/vs-d-integration/

Maybe this can be finished?

matthias

Nov 20 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 19/11/2010 23:56, Michael Stover wrote:
so that was 4 months ago - how do things currently stand on that initiative?

-Mike

On Fri, Nov 19, 2010 at 6:37 PM, Bruno Medeiros
<brunodomedeiros+spam com.gmail> wrote:

On 19/11/2010 22:25, Michael Stover wrote:

As for D lexers and tokenizers, what would be nice is to
A) build an antlr grammar for D
B) build D targets for antlr so that antlr can generate lexers and
parsers in the D language.

For B) I found http://www.mbutscher.de/antlrd/index.html

For A) A good list of antlr grammars is at
http://www.antlr.org/grammar/list, but there isn't a D grammar.

These things wouldn't be an enormous amount of work to create and
maintain, and, if done, anyone could parse D code in many languages,
including Java and C which would make providing IDE features for D
development easier in those languages (eclipse for instance),
and you
could build lexers and parsers in D using antlr grammars.

-Mike

Yes, that would be much better. It would be directly and immediately
useful for the DDT project:

"But better yet would be to start coding our own custom parser
(using a parser generator like ANTLR for example), that could really
be tailored for IDE needs. In the medium/long term, that's probably
what needs to be done. "
in
http://www.digitalmars.com/d/archives/digitalmars/D/ide/Future_of_Descent_and_D_Eclipse_IDE_635.html

--
Bruno Medeiros - Software Engineer

I don't know about Ellery, as you can see in that thread he/she(?)
mentioned interest in working on that, but I don't know anything more.

As for me, I didn't work on that, nor did I plan to.
Nor am I planning to anytime soon, DDT can handle things with the
current parser for now (bugs can be fixed on the current code, perhaps
some limitations can be resolved by merging some more code from DMD), so
I'll likely work on other more important features before I go there. For
example, I'll likely work on debugger integration, and code completion
improvements before I would go on writing a new parser from scratch.
Plus, it gives more time to hopefully someone else work on it. :P

Unlike Walter, I can't write a D parser in a weekend... :) Not even on a
week, especially since I never done anything of this kind before.

--
Bruno Medeiros - Software Engineer

Nov 24 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 11/24/2010 09:13 AM, Bruno Medeiros wrote:
 I don't know about Ellery, as you can see in that thread he/she(?)
 mentioned interest in working on that, but I don't know anything more.

Normally I go by 'it'.

Been pretty busy this semester, so I haven't been doing much.

But the bottom line is, yes I have working antlr grammars for D1 and D2 
if you don't mind
1) they're slow
2) they're tied to a hacked-out version of the netbeans fork of ANTLR2
3) they're tied to some custom java code
4) I haven't been keeping the tree grammars so up to date

I've not released them for those reasons. Semester will be over in about 
3 weeks, though, and I'll have time then.

 As for me, I didn't work on that, nor did I plan to.
 Nor am I planning to anytime soon, DDT can handle things with the
 current parser for now (bugs can be fixed on the current code, perhaps
 some limitations can be resolved by merging some more code from DMD), so
 I'll likely work on other more important features before I go there. For
 example, I'll likely work on debugger integration, and code completion
 improvements before I would go on writing a new parser from scratch.
 Plus, it gives more time to hopefully someone else work on it. :P

 Unlike Walter, I can't write a D parser in a weekend... :) Not even on a
 week, especially since I never done anything of this kind before.

It took me like 3 months to read his parser to figure out what was going on.

Nov 24 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 24/11/2010 16:19, Ellery Newcomer wrote:
 On 11/24/2010 09:13 AM, Bruno Medeiros wrote:
 I don't know about Ellery, as you can see in that thread he/she(?)
 mentioned interest in working on that, but I don't know anything more.

 Normally I go by 'it'.

I didn't meant to offend or anything, I was just unsure of that. To me 
Ellery seems like a female name (but that can be a bias due to English 
not being my first language, or some other cultural thing). On the other 
hand, I would be surprised if a person of the female variety would be 
that interested in D, to the point of contributing in such way.

 Been pretty busy this semester, so I haven't been doing much.

 But the bottom line is, yes I have working antlr grammars for D1 and D2
 if you don't mind
 1) they're slow
 2) they're tied to a hacked-out version of the netbeans fork of ANTLR2
 3) they're tied to some custom java code
 4) I haven't been keeping the tree grammars so up to date

 I've not released them for those reasons. Semester will be over in about
 3 weeks, though, and I'll have time then.

Hum, doesn't sound like it might be suitable for DDT, but I wasn't 
counting on that either.

 As for me, I didn't work on that, nor did I plan to.
 Nor am I planning to anytime soon, DDT can handle things with the
 current parser for now (bugs can be fixed on the current code, perhaps
 some limitations can be resolved by merging some more code from DMD), so
 I'll likely work on other more important features before I go there. For
 example, I'll likely work on debugger integration, and code completion
 improvements before I would go on writing a new parser from scratch.
 Plus, it gives more time to hopefully someone else work on it. :P

 Unlike Walter, I can't write a D parser in a weekend... :) Not even on a
 week, especially since I never done anything of this kind before.

 It took me like 3 months to read his parser to figure out what was going
 on.

Not 3 man-months for sure!, right? (Man-month in the sense of someone 
working 40 hours per week during a month.)


-- 
Bruno Medeiros - Software Engineer

Nov 24 2010

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

On 11/24/2010 02:09 PM, Bruno Medeiros wrote:
 I didn't meant to offend or anything, I was just unsure of that.

None taken; I'm just laughing at you. As I understand it, though, 
'Ellery' is a unisex name, so it is entirely ambiguous.

 It took me like 3 months to read his parser to figure out what was going
 on.

 Not 3 man-months for sure!, right? (Man-month in the sense of someone
 working 40 hours per week during a month.)

Probably not

Nov 24 2010

bearophile <bearophileHUGS lycos.com> writes:

Bruno Medeiros:

 On the other hand, I would be surprised if a person of the female variety
 would be that interested in D, to the point of contributing in such way.

In Python newsgroups I have seen few women, now and then, but in the D
newsgroup so far... not many. So far D seems a male thing. I don't know why. At
the university at the Computer Science course there are a good enough number of
female students (and few female teachers too).

Bye,
bearophile

Nov 24 2010

Daniel Gibson <metalcaedes gmail.com> writes:

bearophile schrieb:
 Bruno Medeiros:
 
 On the other hand, I would be surprised if a person of the female variety
 would be that interested in D, to the point of contributing in such way.

 
 In Python newsgroups I have seen few women, now and then, but in the D
newsgroup so far... not many. So far D seems a male thing. I don't know why. At
the university at the Computer Science course there are a good enough number of
female students (and few female teachers too).
 
 Bye,
 bearophile

At my university there are *very* few woman studying computer science.
Most women sitting in CS lectures here are studying maths and have to do some 
basic CS lectures (I don't think they're the kind that would try D voluntarily).
We have two female professors though.

Nov 24 2010

"Nick Sabalausky" <a a.a> writes:

"Daniel Gibson" <metalcaedes gmail.com> wrote in message 
news:icjv6l$p1r$2 digitalmars.com...
 bearophile schrieb:
 Bruno Medeiros:

 On the other hand, I would be surprised if a person of the female 
 variety
 would be that interested in D, to the point of contributing in such way.

 In Python newsgroups I have seen few women, now and then, but in the D 
 newsgroup so far... not many. So far D seems a male thing. I don't know 
 why. At the university at the Computer Science course there are a good 
 enough number of female students (and few female teachers too).

 Bye,
 bearophile

 At my university there are *very* few woman studying computer science.
 Most women sitting in CS lectures here are studying maths and have to do 
 some basic CS lectures (I don't think they're the kind that would try D 
 voluntarily).
 We have two female professors though.


fest.

Nov 25 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 24/11/2010 21:12, Daniel Gibson wrote:
 bearophile schrieb:
 Bruno Medeiros:

 On the other hand, I would be surprised if a person of the female
 variety
 would be that interested in D, to the point of contributing in such way.

 In Python newsgroups I have seen few women, now and then, but in the D
 newsgroup so far... not many. So far D seems a male thing. I don't
 know why. At the university at the Computer Science course there are a
 good enough number of female students (and few female teachers too).

 Bye,
 bearophile

 At my university there are *very* few woman studying computer science.
 Most women sitting in CS lectures here are studying maths and have to do
 some basic CS lectures (I don't think they're the kind that would try D
 voluntarily).
 We have two female professors though.

It is well know that there is a big gender gap in CS with regards to 
students and professionals. Something like 5-20% I guess, depending on 
university, company, etc..

But the interesting thing (although also quite unfortunate), is that 
this gap takes a even greater dip downwards when you consider the 
communities of FOSS developers/contributors. It must be well below 1%!
(note that I'm not talking about *users* of FOSS software, but only 
people who actually contribute code, whether for FOSS projects, or for 
their own indie/toy projects)


-- 
Bruno Medeiros - Software Engineer

Nov 26 2010

Bruno Medeiros <brunodomedeiros+spam com.gmail> writes:

On 27/10/2010 22:04, retard wrote:
 Wed, 27 Oct 2010 13:52:29 -0700, Walter Bright wrote:

 retard wrote:
 Wed, 27 Oct 2010 12:08:19 -0700, Walter Bright wrote:

 retard wrote:
 This is why the basic data structure in functional languages,
 algebraic data types, suits better for this purpose.

 I think you recently demonstrated otherwise, as proven by the
 widespread use of Java :-)

 I don't understand your logic -- Widespread use of Java proves that
 algebraic data types aren't a better suited way for expressing
 compiler's data structures such as syntax trees?

 You told me that widespread use of Java proved that nothing more complex
 than what Java provides is useful:

 "Java is mostly used for general purpose programming so your claims
 about usefulness and the need for extreme performance look silly."

 I'd be surprised if you seriously meant that, as it implies that Java is
 the pinnacle of computer language design, but I can't resist teasing you
 about it. :-)

 I only meant that the widespead adoption of Java shows how the public at
 large cares very little about the performance issues you mentioned. Java
 is one of the most widely used languages and it's also successful in many
 fields. Things could be better from programming language theory's point
 of view, but the business world is more interesting in profits and the
 large pool of Java coders has given better benefits than more expressive
 languages. I don't think that says anything against my notes about
 algebraic data types.

"the widespead adoption of Java shows how the public at large cares very 
little about the performance issues you mentioned"

WTF? The widespead adoption of Java means that _Java developers_ at 
large don't care about those performance issues (mostly because they 
work on stuff where they don't need to). But it's no statement about all 
the pool of developers. Java is hugely popular, but not in a "it's 
practically the only language people use" way. It's not like Windows on 
the desktop.


-- 
Bruno Medeiros - Software Engineer

Nov 19 2010

dolive <dolive89 sina.com> writes:

Walter Bright д��:

 As we all know, tool support is important for D's success. Making tools easier 
 to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
std.lang.d.lex. 
 It would be helpful in writing color syntax highlighting filters, pretty 
 printers, repl, doc generators, static analyzers, and even D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
output
 2. optionally not generate lexical errors, but just try to recover and continue
 3. optionally return comments and ddoc comments as tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained in
tandem
 
 It can also serve as the basis for creating a javascript implementation that
can 
 be embedded into web pages for syntax highlighting, and eventually an 
 std.lang.d.parse.
 
 Anyone want to own this?

intense support! Someone to do it?

Feb 26 2011

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Saturday 26 February 2011 02:06:18 dolive wrote:
 Walter Bright =D0=B4=B5=BD:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.
=20
 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.
=20
 It should:
=20
 1. support a range interface for its input, and a range interface for i=


ts
 output 2. optionally not generate lexical errors, but just try to
 recover and continue 3. optionally return comments and ddoc comments as
 tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintain=


ed
 in tandem
=20
 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.
=20
 Anyone want to own this?

=20
 intense support! Someone to do it?

I'm working on it, but I have enough else going on right now that I'm no be=
ing=20
very quick about it. I don't know when it will be done.

=2D Jonathan M Davis

Feb 26 2011

dolive <dolive89 sina.com> writes:

Jonathan M Davis д��:

 On Saturday 26 February 2011 02:06:18 dolive wrote:
 Walter Bright д��:
 As we all know, tool support is important for D's success. Making tools
 easier to build will help with that.
 
 To that end, I think we need a lexer for the standard library -
 std.lang.d.lex. It would be helpful in writing color syntax highlighting
 filters, pretty printers, repl, doc generators, static analyzers, and
 even D compilers.
 
 It should:
 
 1. support a range interface for its input, and a range interface for its
 output 2. optionally not generate lexical errors, but just try to
 recover and continue 3. optionally return comments and ddoc comments as
 tokens
 4. the tokens should be a value type, not a reference type
 5. generally follow along with the C++ one so that they can be maintained
 in tandem
 
 It can also serve as the basis for creating a javascript implementation
 that can be embedded into web pages for syntax highlighting, and
 eventually an std.lang.d.parse.
 
 Anyone want to own this?

 
 intense support! Someone to do it?

 
 I'm working on it, but I have enough else going on right now that I'm no being 
 very quick about it. I don't know when it will be done.
 
 - Jonathan M Davis

thanks, make an all out effort !

Feb 26 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Looking for champion - std.lang.d.lex