digitalmars.D.bugs - [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"

d-bugmail puremagic.com (45/45) Sep 01 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466

BCS (2/55) Sep 01 2007 or make it "DecimalDigits . [^.]" where the ^ production is non consumin...

Jascha Wetzel (6/63) Sep 01 2007 it is possible to parse D using a maximal munch lexer - see the seatd

BCS (6/10) Sep 02 2007 another case:

Chris Nicholson-Sauls (6/22) Sep 02 2007 I might be wrong, but my guess is that 'is' is always treated as its own...

BCS (3/23) Sep 02 2007 For that to work the lexer has to keep track of whitespace. :-b

Jascha Wetzel (3/4) Sep 03 2007 you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for

BCS (3/10) Sep 03 2007 That's what I'm hoping to do sooner or later. I already do somthing like...

d-bugmail puremagic.com (13/113) Sep 03 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466
d-bugmail puremagic.com (15/15) Sep 09 2007 http://d.puremagic.com/issues/show_bug.cgi?id=1466

Jascha Wetzel (5/8) Sep 09 2007 this *is* maximal munch taking place. because of the ".." lexeme, float

BCS (3/13) Sep 09 2007 But is it the the correct way to do it? (Not is is doing what the spec s...

Jascha Wetzel (5/20) Sep 09 2007 i think "ptr ! is null" shouldn't be allowed, because it suggests that

Jascha Wetzel (3/12) Sep 09 2007 this was formulated poorly. float literals *may* be considered context

Matti Niemenmaa (14/27) Sep 10 2007 Exactly. But the way I read the spec, float literals are considered toke...

Jascha Wetzel (6/31) Sep 10 2007 i agree that it's not clear. one can argue, though, that if you consider...

Matti Niemenmaa (5/6) Sep 10 2007 Exactly, which is what the bug is about. :-)

d-bugmail puremagic.com (12/12) Nov 09 2010 http://d.puremagic.com/issues/show_bug.cgi?id=1466

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1466

           Summary: Spec claims maximal munch technique always works: not
                    for "1..3"
           Product: D
           Version: 1.020
          Platform: All
               URL: http://digitalmars.com/d/1.0/lex.html
        OS/Version: All
            Status: NEW
          Keywords: spec
          Severity: minor
          Priority: P3
         Component: www.digitalmars.com
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: deewiant gmail.com


A snippet from http://digitalmars.com/d/1.0/lex.html:

"The source text is split into tokens using the maximal munch technique, i.e.,
the lexical analyzer tries to make the longest token it can."

Relevant parts of the grammar:

Token:
        FloatLiteral
        ..

FloatLiteral:
        Float

Float:
        DecimalFloat

DecimalFloat:
        DecimalDigits .
        . Decimal

DecimalDigits:
        DecimalDigit

DecimalDigit:
        NonZeroDigit

Decimal:
        NonZeroDigit

Based on the above, if a lexer encounters "1..3", for instance in a slice:
"foo[1..3]", it should, using the maximal munch technique, make the longest
possible token from "1..3": this is the Float "1.". Next, it should come up
with the Float ".3".

Of course, this isn't currently happening, and would be problematic if it did.
But, according to the grammar, that's what should happen, unless I'm missing
something.

Either some exception needs to be made or remove the "DecimalDigits ."
possibility from the grammar and the compiler.


--

Sep 01 2007

BCS <ao pathlink.com> writes:

Reply to d-bugmail puremagic.com,

 http://d.puremagic.com/issues/show_bug.cgi?id=1466
 
 Summary: Spec claims maximal munch technique always works:
 not
 for "1..3"
 Product: D
 Version: 1.020
 Platform: All
 URL: http://digitalmars.com/d/1.0/lex.html
 OS/Version: All
 Status: NEW
 Keywords: spec
 Severity: minor
 Priority: P3
 Component: www.digitalmars.com
 AssignedTo: bugzilla digitalmars.com
 ReportedBy: deewiant gmail.com
 A snippet from http://digitalmars.com/d/1.0/lex.html:
 
 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest token
 it can."
 
 Relevant parts of the grammar:
 
 Token:
 FloatLiteral
 ..
 FloatLiteral:
 Float
 Float:
 DecimalFloat
 DecimalFloat:
 DecimalDigits .
 . Decimal
 DecimalDigits:
 DecimalDigit
 DecimalDigit:
 NonZeroDigit
 Decimal:
 NonZeroDigit
 Based on the above, if a lexer encounters "1..3", for instance in a
 slice: "foo[1..3]", it should, using the maximal munch technique, make
 the longest possible token from "1..3": this is the Float "1.". Next,
 it should come up with the Float ".3".
 
 Of course, this isn't currently happening, and would be problematic if
 it did. But, according to the grammar, that's what should happen,
 unless I'm missing something.
 
 Either some exception needs to be made or remove the "DecimalDigits ."
 possibility from the grammar and the compiler.
 

or make it "DecimalDigits . [^.]" where the ^ production is non consuming.

Sep 01 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

BCS wrote:
 Reply to d-bugmail puremagic.com,
 
 http://d.puremagic.com/issues/show_bug.cgi?id=1466

 Summary: Spec claims maximal munch technique always works:
 not
 for "1..3"
 Product: D
 Version: 1.020
 Platform: All
 URL: http://digitalmars.com/d/1.0/lex.html
 OS/Version: All
 Status: NEW
 Keywords: spec
 Severity: minor
 Priority: P3
 Component: www.digitalmars.com
 AssignedTo: bugzilla digitalmars.com
 ReportedBy: deewiant gmail.com
 A snippet from http://digitalmars.com/d/1.0/lex.html:

 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest token
 it can."

 Relevant parts of the grammar:

 Token:
 FloatLiteral
 ..
 FloatLiteral:
 Float
 Float:
 DecimalFloat
 DecimalFloat:
 DecimalDigits .
 . Decimal
 DecimalDigits:
 DecimalDigit
 DecimalDigit:
 NonZeroDigit
 Decimal:
 NonZeroDigit
 Based on the above, if a lexer encounters "1..3", for instance in a
 slice: "foo[1..3]", it should, using the maximal munch technique, make
 the longest possible token from "1..3": this is the Float "1.". Next,
 it should come up with the Float ".3".

 Of course, this isn't currently happening, and would be problematic if
 it did. But, according to the grammar, that's what should happen,
 unless I'm missing something.

 Either some exception needs to be made or remove the "DecimalDigits ."
 possibility from the grammar and the compiler.

 
 or make it "DecimalDigits . [^.]" where the ^ production is non consuming.

it is possible to parse D using a maximal munch lexer - see the seatd 
grammar for an example. it's a matter of what lexemes exactly you 
choose. in this particular case, the float lexemes need to be split, 
such that those floats with a trailing dot are not matched by a single 
lexeme.

Sep 01 2007

BCS <ao pathlink.com> writes:

Reply to d-bugmail puremagic.com,


 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest token
 it can."
 

another case:

actual
!isGood -> ! isGood 

MaxMunch
!isGood -> !is Good

Sep 02 2007

Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:

BCS wrote:
 Reply to d-bugmail puremagic.com,
 
 
 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest token
 it can."

 
 another case:
 
 actual
 !isGood -> ! isGood
 MaxMunch
 !isGood -> !is Good
 
 

I might be wrong, but my guess is that 'is' is always treated as its own
entity, so that 
'!is' is really ('!' 'is').  Its not a bad practice when one has
keyword-operators to do 
this, to avoid MM screwing up user's identifiers.  But, as I haven't taken any
trips 
through the DMD frontend source, I might be completely off.

-- Chris Nicholson-Sauls

Sep 02 2007

BCS <ao pathlink.com> writes:

Reply to Chris Nicholson-Sauls,

 BCS wrote:
 
 Reply to d-bugmail puremagic.com,
 
 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest
 token it can."
 

 another case:
 
 actual
 !isGood -> ! isGood
 MaxMunch
 !isGood -> !is Good

 I might be wrong, but my guess is that 'is' is always treated as its
 own entity, so that '!is' is really ('!' 'is').  Its not a bad

That's how I spoted it in the first place

 practice when one has keyword-operators to do this, to avoid MM
 screwing up user's identifiers.  But, as I haven't taken any trips
 through the DMD frontend source, I might be completely off.
 

For that to work the lexer has to keep track of whitespace. :-b

Sep 02 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

BCS wrote:
 For that to work the lexer has to keep track of whitespace. :-b

you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for 
the submatch. or use a single-character lookahead.

Sep 03 2007

BCS <ao pathlink.com> writes:

Reply to Jascha,

 BCS wrote:
 
 For that to work the lexer has to keep track of whitespace. :-b
 

 you can also match "(!is)[^_a-zA-Z0-9]", advancing the input only for
 the submatch. or use a single-character lookahead.
 

That's what I'm hoping to do sooner or later. I already do somthing like 
that for ".." vs "."

Sep 03 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1466


jascha mainia.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jascha mainia.de






 A snippet from http://digitalmars.com/d/1.0/lex.html:
 
 "The source text is split into tokens using the maximal munch technique, i.e.,
 the lexical analyzer tries to make the longest token it can."
 
 Relevant parts of the grammar:
 
 Token:
         FloatLiteral
         ..
 
 FloatLiteral:
         Float
 
 Float:
         DecimalFloat
 
 DecimalFloat:
         DecimalDigits .
         . Decimal
 
 DecimalDigits:
         DecimalDigit
 
 DecimalDigit:
         NonZeroDigit
 
 Decimal:
         NonZeroDigit
 
 Based on the above, if a lexer encounters "1..3", for instance in a slice:
 "foo[1..3]", it should, using the maximal munch technique, make the longest
 possible token from "1..3": this is the Float "1.". Next, it should come up
 with the Float ".3".
 
 Of course, this isn't currently happening, and would be problematic if it did.
 But, according to the grammar, that's what should happen, unless I'm missing
 something.
 
 Either some exception needs to be made or remove the "DecimalDigits ."
 possibility from the grammar and the compiler.
 


 Reply to d-bugmail puremagic.com,
 
 http://d.puremagic.com/issues/show_bug.cgi?id=1466
 
 Summary: Spec claims maximal munch technique always works:
 not
 for "1..3"
 Product: D
 Version: 1.020
 Platform: All
 URL: http://digitalmars.com/d/1.0/lex.html
 OS/Version: All
 Status: NEW
 Keywords: spec
 Severity: minor
 Priority: P3
 Component: www.digitalmars.com
 AssignedTo: bugzilla digitalmars.com
 ReportedBy: deewiant gmail.com
 A snippet from http://digitalmars.com/d/1.0/lex.html:
 
 "The source text is split into tokens using the maximal munch
 technique, i.e., the lexical analyzer tries to make the longest token
 it can."
 
 Relevant parts of the grammar:
 
 Token:
 FloatLiteral
 ..
 FloatLiteral:
 Float
 Float:
 DecimalFloat
 DecimalFloat:
 DecimalDigits .
 . Decimal
 DecimalDigits:
 DecimalDigit
 DecimalDigit:
 NonZeroDigit
 Decimal:
 NonZeroDigit
 Based on the above, if a lexer encounters "1..3", for instance in a
 slice: "foo[1..3]", it should, using the maximal munch technique, make
 the longest possible token from "1..3": this is the Float "1.". Next,
 it should come up with the Float ".3".
 
 Of course, this isn't currently happening, and would be problematic if
 it did. But, according to the grammar, that's what should happen,
 unless I'm missing something.
 
 Either some exception needs to be made or remove the "DecimalDigits ."
 possibility from the grammar and the compiler.
 

 
 or make it "DecimalDigits . [^.]" where the ^ production is non consuming.
 

it is possible to parse D using a maximal munch lexer - see the seatd grammar
for an example. it's a matter of what lexemes exactly you choose. in this
particular case, the float lexemes need to be split, such that those floats
with a trailing dot are not matched by a single lexeme. 


--

Sep 03 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1466






-------
Here's some example code underlining the issue:

class Foo {
        static int opSlice(double a, double b) {
                return 0;
        }
}

void main() {
        // works
        assert (Foo[0. .. 1] == 0);
        // thinks it's [0 ... 1], no maximal munch taking place
        assert (Foo[0... 1] == 0);
}


--

Sep 09 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

d-bugmail puremagic.com wrote:
         // thinks it's [0 ... 1], no maximal munch taking place
         assert (Foo[0... 1] == 0);
 }

this *is* maximal munch taking place. because of the ".." lexeme, float 
literals are not lexemes. they are context free production rules 
consisting of multiple lexemes. therefore "0." consists of two lexemes 
and "..." wins the max munch over ".".

Sep 09 2007

BCS <ao pathlink.com> writes:

Reply to Jascha,

 d-bugmail puremagic.com wrote:
 
 // thinks it's [0 ... 1], no maximal munch taking place
 assert (Foo[0... 1] == 0);
 }

 this *is* maximal munch taking place. because of the ".." lexeme,
 float literals are not lexemes. they are context free production rules
 consisting of multiple lexemes. therefore "0." consists of two lexemes
 and "..." wins the max munch over ".".
 

But is it the the correct way to do it? (Not is is doing what the spec says, 
but is it doing what it should be designed to do)

Sep 09 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

BCS wrote:
 Reply to Jascha,
 
 d-bugmail puremagic.com wrote:

 // thinks it's [0 ... 1], no maximal munch taking place
 assert (Foo[0... 1] == 0);
 }

 this *is* maximal munch taking place. because of the ".." lexeme,
 float literals are not lexemes. they are context free production rules
 consisting of multiple lexemes. therefore "0." consists of two lexemes
 and "..." wins the max munch over ".".

 
 But is it the the correct way to do it? (Not is is doing what the spec 
 says, but is it doing what it should be designed to do)

i think "ptr ! is null" shouldn't be allowed, because it suggests that 
"!" and "is" are separate operators and "is null" is an unary expression.
if your lexer supports lookaheads (a single character is enough), you 
can match float literals as lexemes. this is also what DMD does.

Sep 09 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

Jascha Wetzel wrote:
 d-bugmail puremagic.com wrote:
         // thinks it's [0 ... 1], no maximal munch taking place
         assert (Foo[0... 1] == 0);
 }

 
 this *is* maximal munch taking place. because of the ".." lexeme, float 
 literals are not lexemes. they are context free production rules 
 consisting of multiple lexemes. therefore "0." consists of two lexemes 
 and "..." wins the max munch over ".".

this was formulated poorly. float literals *may* be considered context 
free to solve this problem...

Sep 09 2007

Matti Niemenmaa <see_signature for.real.address> writes:

Jascha Wetzel wrote:
 Jascha Wetzel wrote:
 d-bugmail puremagic.com wrote:
         // thinks it's [0 ... 1], no maximal munch taking place
         assert (Foo[0... 1] == 0);
 }

 this *is* maximal munch taking place. because of the ".." lexeme,
 float literals are not lexemes. they are context free production rules
 consisting of multiple lexemes. therefore "0." consists of two lexemes
 and "..." wins the max munch over ".".

 
 this was formulated poorly. float literals *may* be considered context
 free to solve this problem...

Exactly. But the way I read the spec, float literals are considered tokens in
and of themselves. Maybe I misunderstand, but it could use some clarification in
that case: lex.html specifically says that the lexer splits the code into
tokens, one of which is "0.", with maximal munch.

This isn't a /problem/ per se. In the extreme case, of course it is possible to
parse D with maximal munch by considering every character a lexeme of its own
and figuring everything else out thereafter. I'm just saying that the spec seems
to contradict itself in saying that maximal munch should be used, and that it
should match (among other things) "0." as one token. If you do things that way,
it doesn't work the way DMD currently does it. If you match "0." as "0" and "."
and construct a float literal from that later, it works.

-- 
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Sep 10 2007

Jascha Wetzel <"[firstname]" mainia.de> writes:

Matti Niemenmaa wrote:
 Jascha Wetzel wrote:
 Jascha Wetzel wrote:
 d-bugmail puremagic.com wrote:
         // thinks it's [0 ... 1], no maximal munch taking place
         assert (Foo[0... 1] == 0);
 }

 this *is* maximal munch taking place. because of the ".." lexeme,
 float literals are not lexemes. they are context free production rules
 consisting of multiple lexemes. therefore "0." consists of two lexemes
 and "..." wins the max munch over ".".

 this was formulated poorly. float literals *may* be considered context
 free to solve this problem...

 
 Exactly. But the way I read the spec, float literals are considered tokens in
 and of themselves. Maybe I misunderstand, but it could use some clarification
in
 that case: lex.html specifically says that the lexer splits the code into
 tokens, one of which is "0.", with maximal munch.
 
 This isn't a /problem/ per se. In the extreme case, of course it is possible to
 parse D with maximal munch by considering every character a lexeme of its own
 and figuring everything else out thereafter. I'm just saying that the spec
seems
 to contradict itself in saying that maximal munch should be used, and that it
 should match (among other things) "0." as one token. If you do things that way,
 it doesn't work the way DMD currently does it. If you match "0." as "0" and "."
 and construct a float literal from that later, it works.

i agree that it's not clear. one can argue, though, that if you consider 
lookaheads as part of the lexeme specification, the maximal munch 
property remains intact. then "0." can only match if not followed by 
another "." - no contradiction to the max munch. but that should be 
stated in the specs, of course.

Sep 10 2007

Matti Niemenmaa <see_signature for.real.address> writes:

Jascha Wetzel wrote:
<snip>
 but that should be stated in the specs, of course.

Exactly, which is what the bug is about. :-)

-- 
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Sep 10 2007

d-bugmail puremagic.com writes:

http://d.puremagic.com/issues/show_bug.cgi?id=1466


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla digitalmars.com
         Resolution|                            |FIXED



19:43:04 PST ---
http://www.dsource.org/projects/phobos/changeset/2148

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

Nov 09 2010

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - [Issue 1466] New: Spec claims maximal munch technique always works: not for "1..3"