www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Lexers (again)

reply "Brian Schott" <briancschott gmail.com> writes:
I've been working on the next attepmpt at a std.lexer / 
std.d.lexer recently. You can follow the progress on Github here: 
https://github.com/Hackerpilot/lexer-work
Dec 13 2013
next sibling parent "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / 
 std.d.lexer recently. You can follow the progress on Github 
 here: https://github.com/Hackerpilot/lexer-work
A problem I noticed was your using ubyte[] at least in the runlexer. Does it work with string and wstring though? Also why is it required to pass the type to the lexer of the code to pass? Is there another way to make it easier to use? Or is the only way to wrap the constructor in a templated function? There also seem to be a lot of generic type method implementations in DLexer that I would expect to be done inside the Lexer super (well template I spose). All in all looks promising.
Dec 13 2013
prev sibling next sibling parent Martin Nowak <code dawg.eu> writes:
On 12/13/2013 11:17 AM, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / std.d.lexer
 recently. You can follow the progress on Github here:
 https://github.com/Hackerpilot/lexer-work
Looks promising. I hope that I find some time to work on a completely generic DFA lexer generator (regex based). I found a few papers/had some ideas on how to vectorize the DFA processing to make it fast enough.
Dec 13 2013
prev sibling next sibling parent reply "Brian Schott" <briancschott gmail.com> writes:
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / 
 std.d.lexer recently. You can follow the progress on Github 
 here: https://github.com/Hackerpilot/lexer-work
I've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
Dec 15 2013
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/15/2013 12:12 PM, Brian Schott wrote:
 On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / std.d.lexer
 recently. You can follow the progress on Github here:
 https://github.com/Hackerpilot/lexer-work
I've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
I cannot reproduce your problem. If this does not work, it is a bug.
Dec 15 2013
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/15/13 3:45 AM, Timon Gehr wrote:
 On 12/15/2013 12:12 PM, Brian Schott wrote:
 On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / std.d.lexer
 recently. You can follow the progress on Github here:
 https://github.com/Hackerpilot/lexer-work
I've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
I cannot reproduce your problem. If this does not work, it is a bug.
The problem is that tok is a dynamic value. It should be a static value. Current code: static property IDType tok(string symbol)() { ... } It should be: template IDType tok(string symbol)() { alias tok = ...; } This is important - if the compiler thinks tok is a dynamic value, it'll generate crappy switch statements. BTW Brian - I didn't look at this in depth yet but it's very promising work. Thanks! Andrei
Dec 15 2013
next sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 12/15/2013 05:38 PM, Andrei Alexandrescu wrote:
 One limitation I've noticed with the new tok!"tokenName" approach is
 that while dmd has no problem with

 case tok!"class":

 it does have a problem with

 goto case tok!"class":

 I managed to work around this by adding new labels and "goto"-ing them
 instead. Is this a bug or intentional?
I cannot reproduce your problem. If this does not work, it is a bug.
The problem is that tok is a dynamic value. It should be a static value.
Note that the spec has this to say: http://dlang.org/statement.html#SwitchStatement "Expression is evaluated. The result type T must be of integral type or char[], wchar[] or dchar[]. The result is compared against each of the case expressions. If there is a match, the corresponding case statement is transferred to. The case expressions must all evaluate to a constant value or array, or a runtime initialized const or immutable variable of integral type. They must be implicitly convertible to the type of the switch Expression. Case expressions must all evaluate to distinct values. Const or immutable variables must all have different names. If they share a value, the first case statement with that value gets control. There must be exactly one default statement." Arguably, this is a questionable language design decision that should IMO be revisited anyway, but DMD clearly does not follow the spec here. Also, there is this: "The fourth form, goto case Expression;, transfers to the CaseStatement of the innermost enclosing SwitchStatement with a matching Expression." It does not say anything about what kind of expression is required.
Dec 15 2013
prev sibling parent "Brian Schott" <briancschott gmail.com> writes:
On Sunday, 15 December 2013 at 16:38:15 UTC, Andrei Alexandrescu 
wrote:
 The problem is that tok is a dynamic value. It should be a 
 static value. Current code:
This seems to have fixed the case/goto issues.
 This is important - if the compiler thinks tok is a dynamic 
 value, it'll generate crappy switch statements.
It seems it's hard to keep dmd from generating crappy code even with this fix. I tried it with both LDC and DMD. The code from DMD takes 3.5 times as long to execute.
 BTW  Brian - I didn't look at this in depth yet but it's very 
 promising work. Thanks!
It's based off of the gist you posted a while back. I'll have to compare this to what you(r team) came up with for Facebook's C++ analyzer.
Dec 16 2013
prev sibling parent "Jonas Drewsen" <nospam4321 hotmail.com > writes:
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:
 I've been working on the next attepmpt at a std.lexer / 
 std.d.lexer recently. You can follow the progress on Github 
 here: https://github.com/Hackerpilot/lexer-work
knit picking... but shouldn't: size_t line() pure nothrow const property { return _line; } be more like: property size_t line() pure nothrow const { return _line; } to be consistent with phobos coding style? /Jonas
Dec 16 2013