D - [BUG] dmd does not implement LR analysis

Manfred Nowak (15/15) Mar 12 2004 Also not explicitely specified the usual left-to-right lexical analysis

Walter (4/18) Mar 13 2004 ... is a valid token. You'll need to put the space after the first . to ...

Manfred Nowak (31/34) Mar 13 2004 I am not talking about meanings I wish. I noticed this departure from th...

Stewart Gordon (28/44) Mar 16 2004 You're right, that syntax highlighters that are strictly LR have trouble...

Matthew (3/14) Mar 16 2004 I think the cast operator should be mandatory

J C Calvarese (6/27) Mar 16 2004 I absolutely agree. It has to be now. Before D 1.0 is set and we have a

Matthew (5/29) Mar 17 2004 computer

Manfred Nowak (8/21) Mar 17 2004 Thanks for this link.

Stewart Gordon (14/19) Mar 17 2004 2. . 4

Manfred Nowak (4/5) Mar 17 2004 Agreed. I did not think of this argument.

Ben Hinkle (6/30) Mar 13 2004 Fortran, MATLAB and Python use : for slicing instead of ..

C. Sauls (7/10) Mar 14 2004 MOO uses '..' as well, and having recently written a MOO

Stewart Gordon (10/14) Mar 15 2004

Manfred Nowak (21/22) Mar 15 2004 context free is an attribute that belongs to grammars. At your will dmd
larry cowan (6/16) Mar 15 2004 For what it's worth, .5+4. and 4.+.5 both work as expected, equaling 4.5...

Manfred Nowak <svv1999 hotmail.com> writes:

Also not explicitely specified the usual left-to-right lexical analysis
and parsing of the grammar of D is currently not implemented in dmd.

Currently `2.' and `.4' are legal real numbers. Therefore the look alike
range `[cast(int)2..4]' is not a range but should be analysed as two
consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
therefore should yield something like:

| found '0.4' when expecting ']'

In the lexical analysis phase of dmd there has been done some trickery to
prevent this, i.e. looking ahead and backing up.

On the other hand this trickery prevents now, that the legal range
expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
4]' is not correctly identified by dmd. dmd yields:

| found '...' when expecting ']'

So long.

Mar 12 2004

"Walter" <walter digitalmars.com> writes:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:c2uekl$1995$1 digitaldaemon.com...
 Also not explicitely specified the usual left-to-right lexical analysis
 and parsing of the grammar of D is currently not implemented in dmd.

 Currently `2.' and `.4' are legal real numbers. Therefore the look alike
 range `[cast(int)2..4]' is not a range but should be analysed as two
 consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
 therefore should yield something like:

 | found '0.4' when expecting ']'

 In the lexical analysis phase of dmd there has been done some trickery to
 prevent this, i.e. looking ahead and backing up.

 On the other hand this trickery prevents now, that the legal range
 expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
 4]' is not correctly identified by dmd. dmd yields:

 | found '...' when expecting ']'

 So long.

... is a valid token. You'll need to put the space after the first . to get
the meaning you wish. True, the lexer does a bit of lookahead, but why not?

Mar 13 2004

Manfred Nowak <svv1999 hotmail.com> writes:

Walter wrote:

 ... is a valid token. You'll need to put the space after the first . to
 get the meaning you wish.

I am not talking about meanings I wish. I noticed this departure from the
norm, because the public available syntax highlighting extension for D for
vim exposed me `[2..4]' as two consecutive reals, thereby pointing me out,
that my own syntax highlighting extension is wrong because I thought, that
it is illegal to have an empty integer or fractional part in a real.

Then: following the usual left-to-right-analysis it is correct to analyze
the construct in question as two consecutive reals and furthermore there
is no way to build an LR-highlighter that is able to highlight the
construct in question as two integer numbers divided by the range operator
`..'.

Even the `d2html' example highlights the construct in question as the
real `2.', followed by a `.', followed by the integer `4'.

I do not believe that any syntax highlighter currently out there is able
to highlight the construct in question correctly.

  
 True, the lexer does a bit of lookahead, but why not?

That depends on what DigitalMars has in mind with the language D and the
de facto reference compiler dmd.

If the intention of DigitalMars is to tempt a certain amount of computer
nerds to the language D by promising an open standard and at the same time
bind them to a proprietary implementation not fully consistent with the
proposed standard and its somehow natural interpretation, then it is quite
okay to make even more departures than the two I have detected:

- the one which is the matter of this thread, and
- the `cast' operator beeing optional in dmd.

If the intention of DigitalMars is to keep the language D and the de facto
reference compiler dmd in a homogeneous state, then the existence of
both exposed deviations is not okay.

There might be more intentions of DigitalMars, which I am unable to
recognize.

So long.

Mar 13 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak wrote:
<snip>
 Even the `d2html' example highlights the construct in question as the
 real `2.', followed by a `.', followed by the integer `4'.
 
 I do not believe that any syntax highlighter currently out there is able
 to highlight the construct in question correctly.

You're right, that syntax highlighters that are strictly LR have trouble 
with syntaxes that aren't strictly LR.  But see below....

 True, the lexer does a bit of lookahead, but why not?


Depends on whether the lexicality is supposed to be strictly LR.  But I 
did just notice this in the spec:

"There are no digraphs or trigraphs in D. The source text is split into 
tokens using the maximal munch technique, i.e., the lexical analyzer 
tries to make the longest token it can. For example >> is a right shift 
token, not two greater than tokens."

But if that's exactly true, then from the way string literals are 
specified, surely in

	qwert("yuiop", "asdfg")

a single, 14-character string is being passed?

 That depends on what DigitalMars has in mind with the language D and the
 de facto reference compiler dmd.

I think what it should have in mind is making the spec clearer.  You're 
right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2. 
.4 or even any of the three other possibilities.

Of course it isn't difficult to write a lexer that looks ahead two or 
three characters.  The only trouble is that it's doing it for what's not 
clearly specified.

 If the intention of DigitalMars is to tempt a certain amount of computer
 nerds to the language D by promising an open standard and at the same time
 bind them to a proprietary implementation not fully consistent with the
 proposed standard and its somehow natural interpretation, then it is quite
 okay to make even more departures than the two I have detected:
 
 - the one which is the matter of this thread, and
 - the `cast' operator beeing optional in dmd.

<snip>

You're right, that's just what I've been thinking for a while.  There 
does seem to be both an inconsistency and a deviation from CFG with casts.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Mar 16 2004

"Matthew" <matthew stlsoft.org> writes:

 If the intention of DigitalMars is to tempt a certain amount of computer
 nerds to the language D by promising an open standard and at the same


time
 bind them to a proprietary implementation not fully consistent with the
 proposed standard and its somehow natural interpretation, then it is


quite
 okay to make even more departures than the two I have detected:

 - the one which is the matter of this thread, and
 - the `cast' operator beeing optional in dmd.

 <snip>

 You're right, that's just what I've been thinking for a while.  There
 does seem to be both an inconsistency and a deviation from CFG with casts.

I think the cast operator should be mandatory

Mar 16 2004

J C Calvarese <jcc7 cox.net> writes:

Matthew wrote:
If the intention of DigitalMars is to tempt a certain amount of computer
nerds to the language D by promising an open standard and at the same


 
 time
 
bind them to a proprietary implementation not fully consistent with the
proposed standard and its somehow natural interpretation, then it is


 
 quite
 
okay to make even more departures than the two I have detected:

- the one which is the matter of this thread, and
- the `cast' operator beeing optional in dmd.

<snip>

You're right, that's just what I've been thinking for a while.  There
does seem to be both an inconsistency and a deviation from CFG with casts.

 
 I think the cast operator should be mandatory

I absolutely agree. It has to be now. Before D 1.0 is set and we have a 
bunch of legacy code with C-style casts hanging around.

-- 
Justin
http://jcc_7.tripod.com/d/

Mar 16 2004

"Matthew" <matthew stlsoft.org> writes:

"J C Calvarese" <jcc7 cox.net> wrote in message
news:c38i7b$un$1 digitaldaemon.com...
 Matthew wrote:
If the intention of DigitalMars is to tempt a certain amount of




computer
nerds to the language D by promising an open standard and at the same


 time

bind them to a proprietary implementation not fully consistent with the
proposed standard and its somehow natural interpretation, then it is


 quite

okay to make even more departures than the two I have detected:

- the one which is the matter of this thread, and
- the `cast' operator beeing optional in dmd.

<snip>

You're right, that's just what I've been thinking for a while.  There
does seem to be both an inconsistency and a deviation from CFG with



casts.
 I think the cast operator should be mandatory

 I absolutely agree. It has to be now. Before D 1.0 is set and we have a
 bunch of legacy code with C-style casts hanging around.

Quite right. Let me presumptuously institute a vote.

Mar 17 2004

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon wrote:

[...]
 "There are no digraphs or trigraphs in D. The source text is split into
 tokens using the maximal munch technique, i.e., the lexical analyzer
 tries to make the longest token it can. For example >> is a right shift
 token, not two greater than tokens."

Thanks for this link.


 But if that's exactly true, then from the way string literals are
 specified, surely in
 
 	qwert("yuiop", "asdfg")
 
 a single, 14-character string is being passed?

Right. It should be specified, that allowed characters do not include the
delimiting `"' or ``'.


 I think what it should have in mind is making the spec clearer.  You're
 right, there's nothing suggesting that 2..4 should be 2 .. 4 and not 2.
 .4 or even any of the three other possibilities.

I see five, but only when not using longest match.


[...]

So long!

Mar 17 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak wrote:

<snip>
 I think what it should have in mind is making the spec clearer.
 You're right, there's nothing suggesting that 2..4 should be 2 .. 4
 and not 2. .4 or even any of the three other possibilities.

 
 I see five, but only when not using longest match.

2. . 4
2 . .4
2 . . 4

Of course, the character sequence could be split up as

2.. 4
2 ..4

but these involve what aren't valid D tokens.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the
unfortunate victim of intensive mail-bombing at the moment.  Please keep
replies on the 'group where everyone may benefit.

Mar 17 2004

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon wrote:

[...]
 but these involve what aren't valid D tokens.

Agreed. I did not think of this argument.

So long!

Mar 17 2004

Ben Hinkle <bhinkle4 juno.com> writes:

On Sat, 13 Mar 2004 14:28:35 -0800, "Walter" <walter digitalmars.com>
wrote:

"Manfred Nowak" <svv1999 hotmail.com> wrote in message
news:c2uekl$1995$1 digitaldaemon.com...
 Also not explicitely specified the usual left-to-right lexical analysis
 and parsing of the grammar of D is currently not implemented in dmd.

 Currently `2.' and `.4' are legal real numbers. Therefore the look alike
 range `[cast(int)2..4]' is not a range but should be analysed as two
 consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
 therefore should yield something like:

 | found '0.4' when expecting ']'

 In the lexical analysis phase of dmd there has been done some trickery to
 prevent this, i.e. looking ahead and backing up.

 On the other hand this trickery prevents now, that the legal range
 expression `[cast(int)2...4]' which could be written as `[cast(int)2. ..
 4]' is not correctly identified by dmd. dmd yields:

 | found '...' when expecting ']'

 So long.

... is a valid token. You'll need to put the space after the first . to get
the meaning you wish. True, the lexer does a bit of lookahead, but why not?

Fortran, MATLAB and Python use : for slicing instead of ..
I don't know the history of why but maybe this parsing issue factored
into it. The .. reminds me more of Pascal.

-Ben

Mar 13 2004

"C. Sauls" <ibisbasenji yahoo.com> writes:

MOO uses '..' as well, and having recently written a MOO 
parser/compiler/driver I can say its do-able.  Of course, MOO requires 
that floating-point numbers contain both integer and fraction, even if 
one is equal to 0 so maybe that makes all the difference.

-C. Sauls
-Invironz

Ben Hinkle wrote:
 Fortran, MATLAB and Python use : for slicing instead of ..
 I don't know the history of why but maybe this parsing issue factored
 into it. The .. reminds me more of Pascal.

Mar 14 2004

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak wrote:
<snip>
 Currently `2.' and `.4' are legal real numbers. Therefore the look alike
 range `[cast(int)2..4]' is not a range but should be analysed as two
 consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
 therefore should yield something like:

<snip>

That's news to me.  I'd imagined the tokenisation of D was supposed to 
be context-free.

Stewart.

-- 
My e-mail is valid but not my primary mailbox, aside from its being the 
unfortunate victim of intensive mail-bombing at the moment.  Please keep 
replies on the 'group where everyone may benefit.

Mar 15 2004

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon wrote:

[...]
 I'd imagined the tokenisation of D was supposed to be context-free.

context free is an attribute that belongs to grammars. At your will dmd
has not a context free lexical analysis, because the case "natural number
followed by a point" is treated in a special way.

Lexical analysis usually is carried out by left-to-right finding the next
_longest_ part of the remaining source that belongs to a token. This is
called LR analysis.

I.e. `return2;' is the identifier `return2', not the keyword `return'
followed by the integer number `2', followed by a `;'.

Not having an LR lexical analysis does not change the attribute context
free for the grammar, also it is a convention to have LR lexical analysis
with a context free grammar.

If D breaks this convention it should be explicitely mentioned in the
specification.

If the non LR anaylsis stays, then the door is open for more implicite
deviations from the conventions, like the one I mentioned with the
`return2'.

Even the suggestion of an operator that overrides the usual LR
lexical analysis may arise. I would like `�$� ' to be supported then :-)

So long!

Mar 15 2004

larry cowan <larry_member pathlink.com> writes:

In article <c348v1$1o1i$1 digitaldaemon.com>, Stewart Gordon says...
Manfred Nowak wrote:
<snip>
 Currently `2.' and `.4' are legal real numbers. Therefore the look alike
 range `[cast(int)2..4]' is not a range but should be analysed as two
 consecutive real numbers, as if it is written as `[cast(int)2. .4]', and
 therefore should yield something like:

<snip>

That's news to me.  I'd imagined the tokenisation of D was supposed to 
be context-free.

Stewart.

For what it's worth, .5+4. and 4.+.5 both work as expected, equaling 4.5,
but I would rather have leading and trailing 0's required for literal
floats,doubles, and reals. -(.5-4.), 4.-.5 , 4.*-8. , 4./.2 , .1/16. , and
04*20. all look pretty strange at first glance.  I think FP literals should be
more obviously differentiated from integer literals.

Mar 15 2004

D Programming

C/C++ Programming

Other

D - [BUG] dmd does not implement LR analysis