digitalmars.D.bugs - [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"
- d-bugmail puremagic.com (45/45) Sep 01 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466
- BCS (2/55) Sep 01 2007 or make it "DecimalDigits . [^.]" where the ^ production is non consumin...
- Jascha Wetzel (6/63) Sep 01 2007 it is possible to parse D using a maximal munch lexer - see the seatd
- BCS (6/10) Sep 02 2007 another case:
- Chris Nicholson-Sauls (6/22) Sep 02 2007 I might be wrong, but my guess is that 'is' is always treated as its own...
- BCS (3/23) Sep 02 2007 For that to work the lexer has to keep track of whitespace. :-b
- Jascha Wetzel (3/4) Sep 03 2007 you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for
- BCS (3/10) Sep 03 2007 That's what I'm hoping to do sooner or later. I already do somthing like...
- d-bugmail puremagic.com (13/113) Sep 03 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466
- d-bugmail puremagic.com (15/15) Sep 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466
- Jascha Wetzel (5/8) Sep 09 2007 this *is* maximal munch taking place. because of the ".." lexeme, float
- BCS (3/13) Sep 09 2007 But is it the the correct way to do it? (Not is is doing what the spec s...
- Jascha Wetzel (5/20) Sep 09 2007 i think "ptr ! is null" shouldn't be allowed, because it suggests that
- Jascha Wetzel (3/12) Sep 09 2007 this was formulated poorly. float literals *may* be considered context
- Matti Niemenmaa (14/27) Sep 10 2007 Exactly. But the way I read the spec, float literals are considered toke...
- Jascha Wetzel (6/31) Sep 10 2007 i agree that it's not clear. one can argue, though, that if you consider...
- Matti Niemenmaa (5/6) Sep 10 2007 Exactly, which is what the bug is about. :-)
- d-bugmail puremagic.com (12/12) Nov 09 2010 http://d.puremagic.com/issues/show_bug.cgi?id=1466
http://d.puremagic.com/issues/show_bug.cgi?id=1466 Summary: Spec claims maximal munch technique always works: not for "1..3" Product: D Version: 1.020 Platform: All URL: http://digitalmars.com/d/1.0/lex.html OS/Version: All Status: NEW Keywords: spec Severity: minor Priority: P3 Component: www.digitalmars.com AssignedTo: bugzilla digitalmars.com ReportedBy: deewiant gmail.com A snippet from http://digitalmars.com/d/1.0/lex.html: "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can." Relevant parts of the grammar: Token: FloatLiteral .. FloatLiteral: Float Float: DecimalFloat DecimalFloat: DecimalDigits . . Decimal DecimalDigits: DecimalDigit DecimalDigit: NonZeroDigit Decimal: NonZeroDigit Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3". Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something. Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler. --
Sep 01 2007
Reply to d-bugmail puremagic.com,http://d.puremagic.com/issues/show_bug.cgi?id=1466 Summary: Spec claims maximal munch technique always works: not for "1..3" Product: D Version: 1.020 Platform: All URL: http://digitalmars.com/d/1.0/lex.html OS/Version: All Status: NEW Keywords: spec Severity: minor Priority: P3 Component: www.digitalmars.com AssignedTo: bugzilla digitalmars.com ReportedBy: deewiant gmail.com A snippet from http://digitalmars.com/d/1.0/lex.html: "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can." Relevant parts of the grammar: Token: FloatLiteral .. FloatLiteral: Float Float: DecimalFloat DecimalFloat: DecimalDigits . . Decimal DecimalDigits: DecimalDigit DecimalDigit: NonZeroDigit Decimal: NonZeroDigit Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3". Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something. Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
Sep 01 2007
BCS wrote:Reply to d-bugmail puremagic.com,it is possible to parse D using a maximal munch lexer - see the seatd grammar for an example. it's a matter of what lexemes exactly you choose. in this particular case, the float lexemes need to be split, such that those floats with a trailing dot are not matched by a single lexeme.http://d.puremagic.com/issues/show_bug.cgi?id=1466 Summary: Spec claims maximal munch technique always works: not for "1..3" Product: D Version: 1.020 Platform: All URL: http://digitalmars.com/d/1.0/lex.html OS/Version: All Status: NEW Keywords: spec Severity: minor Priority: P3 Component: www.digitalmars.com AssignedTo: bugzilla digitalmars.com ReportedBy: deewiant gmail.com A snippet from http://digitalmars.com/d/1.0/lex.html: "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can." Relevant parts of the grammar: Token: FloatLiteral .. FloatLiteral: Float Float: DecimalFloat DecimalFloat: DecimalDigits . . Decimal DecimalDigits: DecimalDigit DecimalDigit: NonZeroDigit Decimal: NonZeroDigit Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3". Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something. Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
Sep 01 2007
Reply to d-bugmail puremagic.com,"The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."another case: actual !isGood -> ! isGood MaxMunch !isGood -> !is Good
Sep 02 2007
BCS wrote:Reply to d-bugmail puremagic.com,I might be wrong, but my guess is that 'is' is always treated as its own entity, so that '!is' is really ('!' 'is'). Its not a bad practice when one has keyword-operators to do this, to avoid MM screwing up user's identifiers. But, as I haven't taken any trips through the DMD frontend source, I might be completely off. -- Chris Nicholson-Sauls"The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."another case: actual !isGood -> ! isGood MaxMunch !isGood -> !is Good
Sep 02 2007
Reply to Chris Nicholson-Sauls,BCS wrote:That's how I spoted it in the first placeReply to d-bugmail puremagic.com,I might be wrong, but my guess is that 'is' is always treated as its own entity, so that '!is' is really ('!' 'is'). Its not a bad"The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can."another case: actual !isGood -> ! isGood MaxMunch !isGood -> !is Goodpractice when one has keyword-operators to do this, to avoid MM screwing up user's identifiers. But, as I haven't taken any trips through the DMD frontend source, I might be completely off.For that to work the lexer has to keep track of whitespace. :-b
Sep 02 2007
BCS wrote:For that to work the lexer has to keep track of whitespace. :-byou can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for the submatch. or use a single-character lookahead.
Sep 03 2007
Reply to Jascha,BCS wrote:That's what I'm hoping to do sooner or later. I already do somthing like that for ".." vs "."For that to work the lexer has to keep track of whitespace. :-byou can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for the submatch. or use a single-character lookahead.
Sep 03 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1466 jascha mainia.de changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jascha mainia.de ------- Comment #5 from jascha mainia.de 2007-09-03 06:08 ------- (In reply to comment #0)A snippet from http://digitalmars.com/d/1.0/lex.html: "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can." Relevant parts of the grammar: Token: FloatLiteral .. FloatLiteral: Float Float: DecimalFloat DecimalFloat: DecimalDigits . . Decimal DecimalDigits: DecimalDigit DecimalDigit: NonZeroDigit Decimal: NonZeroDigit Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3". Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something. Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.(In reply to comment #1)Reply to d-bugmail puremagic.com,it is possible to parse D using a maximal munch lexer - see the seatd grammar for an example. it's a matter of what lexemes exactly you choose. in this particular case, the float lexemes need to be split, such that those floats with a trailing dot are not matched by a single lexeme. --http://d.puremagic.com/issues/show_bug.cgi?id=1466 Summary: Spec claims maximal munch technique always works: not for "1..3" Product: D Version: 1.020 Platform: All URL: http://digitalmars.com/d/1.0/lex.html OS/Version: All Status: NEW Keywords: spec Severity: minor Priority: P3 Component: www.digitalmars.com AssignedTo: bugzilla digitalmars.com ReportedBy: deewiant gmail.com A snippet from http://digitalmars.com/d/1.0/lex.html: "The source text is split into tokens using the maximal munch technique, i.e., the lexical analyzer tries to make the longest token it can." Relevant parts of the grammar: Token: FloatLiteral .. FloatLiteral: Float Float: DecimalFloat DecimalFloat: DecimalDigits . . Decimal DecimalDigits: DecimalDigit DecimalDigit: NonZeroDigit Decimal: NonZeroDigit Based on the above, if a lexer encounters "1..3", for instance in a slice: "foo[1..3]", it should, using the maximal munch technique, make the longest possible token from "1..3": this is the Float "1.". Next, it should come up with the Float ".3". Of course, this isn't currently happening, and would be problematic if it did. But, according to the grammar, that's what should happen, unless I'm missing something. Either some exception needs to be made or remove the "DecimalDigits ." possibility from the grammar and the compiler.or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
Sep 03 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1466 ------- Comment #7 from matti.niemenmaa+dbugzilla iki.fi 2007-09-09 12:26 ------- Here's some example code underlining the issue: class Foo { static int opSlice(double a, double b) { return 0; } } void main() { // works assert (Foo[0. .. 1] == 0); // thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); } --
Sep 09 2007
d-bugmail puremagic.com wrote:// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 09 2007
Reply to Jascha,d-bugmail puremagic.com wrote:But is it the the correct way to do it? (Not is is doing what the spec says, but is it doing what it should be designed to do)// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 09 2007
BCS wrote:Reply to Jascha,i think "ptr ! is null" shouldn't be allowed, because it suggests that "!" and "is" are separate operators and "is null" is an unary expression. if your lexer supports lookaheads (a single character is enough), you can match float literals as lexemes. this is also what DMD does.d-bugmail puremagic.com wrote:But is it the the correct way to do it? (Not is is doing what the spec says, but is it doing what it should be designed to do)// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 09 2007
Jascha Wetzel wrote:d-bugmail puremagic.com wrote:this was formulated poorly. float literals *may* be considered context free to solve this problem...// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 09 2007
Jascha Wetzel wrote:Jascha Wetzel wrote:Exactly. But the way I read the spec, float literals are considered tokens in and of themselves. Maybe I misunderstand, but it could use some clarification in that case: lex.html specifically says that the lexer splits the code into tokens, one of which is "0.", with maximal munch. This isn't a /problem/ per se. In the extreme case, of course it is possible to parse D with maximal munch by considering every character a lexeme of its own and figuring everything else out thereafter. I'm just saying that the spec seems to contradict itself in saying that maximal munch should be used, and that it should match (among other things) "0." as one token. If you do things that way, it doesn't work the way DMD currently does it. If you match "0." as "0" and "." and construct a float literal from that later, it works. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fid-bugmail puremagic.com wrote:this was formulated poorly. float literals *may* be considered context free to solve this problem...// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 10 2007
Matti Niemenmaa wrote:Jascha Wetzel wrote:i agree that it's not clear. one can argue, though, that if you consider lookaheads as part of the lexeme specification, the maximal munch property remains intact. then "0." can only match if not followed by another "." - no contradiction to the max munch. but that should be stated in the specs, of course.Jascha Wetzel wrote:Exactly. But the way I read the spec, float literals are considered tokens in and of themselves. Maybe I misunderstand, but it could use some clarification in that case: lex.html specifically says that the lexer splits the code into tokens, one of which is "0.", with maximal munch. This isn't a /problem/ per se. In the extreme case, of course it is possible to parse D with maximal munch by considering every character a lexeme of its own and figuring everything else out thereafter. I'm just saying that the spec seems to contradict itself in saying that maximal munch should be used, and that it should match (among other things) "0." as one token. If you do things that way, it doesn't work the way DMD currently does it. If you match "0." as "0" and "." and construct a float literal from that later, it works.d-bugmail puremagic.com wrote:this was formulated poorly. float literals *may* be considered context free to solve this problem...// thinks it's [0 ... 1], no maximal munch taking place assert (Foo[0... 1] == 0); }this *is* maximal munch taking place. because of the ".." lexeme, float literals are not lexemes. they are context free production rules consisting of multiple lexemes. therefore "0." consists of two lexemes and "..." wins the max munch over ".".
Sep 10 2007
Jascha Wetzel wrote: <snip>but that should be stated in the specs, of course.Exactly, which is what the bug is about. :-) -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Sep 10 2007
http://d.puremagic.com/issues/show_bug.cgi?id=1466 Walter Bright <bugzilla digitalmars.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |bugzilla digitalmars.com Resolution| |FIXED --- Comment #9 from Walter Bright <bugzilla digitalmars.com> 2010-11-09 19:43:04 PST --- http://www.dsource.org/projects/phobos/changeset/2148 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 09 2010