digitalmars.D - DIP 1026---Deprecate Context-Sensitive String Literals---Community

Mike Parker (16/16) Dec 03 2019 This is the feedback thread for the first round of Community

Andrea Fontana (7/23) Dec 03 2019 I think there's a problem with this analysis.

Dennis (4/10) Dec 03 2019 I agree, I definitely want to expand my collection of open source

John Colvin (6/22) Dec 03 2019 Is there much point being almost context-free? Seems like we

Dennis (10/15) Dec 03 2019 I _think_ this is the only thing in the lexical grammar that is

Andrei Alexandrescu (5/9) Dec 03 2019 This DIP is a non-starter. Here documents are easily and effectively

Dennis (40/42) Dec 03 2019 I consider this low-hanging fruit: just deprecating a token takes

Adam D. Ruppe (18/20) Dec 03 2019 The identifier ones are trivial, they are a simple regex. Heck,
Andrei Alexandrescu (24/34) Dec 03 2019 These can never be the primary reasons for removing a feature. One

Dennis (37/56) Dec 03 2019 The DIP mentions:

Andrei Alexandrescu (11/20) Dec 03 2019 It was great primarily because it was a built-in feature made

Dennis (12/14) Dec 03 2019 If you truly wanted to convey that, you did a good job. But I do

mipri (69/71) Dec 03 2019 Bad motivation and bad construction. The bad construction is

FeepingCreature (15/23) Dec 04 2019 I think this is a really questionable argument, because it
Dennis (12/26) Dec 04 2019 That's the nature of deprecation: a short term cost for a long

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Dec 04 2019 Suggesting a workable alternative usually is easier. Like:

mipri (7/13) Dec 04 2019 Or specify that q"<<< (three chars exactly) can only be matched

Andrei Alexandrescu (40/43) Dec 04 2019 That got me thinking. Here's what I'd opine.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (11/18) Dec 04 2019 That will prevent qualitative incremental improvements. You

Walter Bright (7/13) Dec 04 2019 This would be a good opening for a separate thread.

Kagamin (7/11) Dec 05 2019 If those other literals are bad. For python it's the opposite:

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 05 2019 Yes, that usage you link to was for docs-strings though (more

Kagamin (4/6) Dec 05 2019 D can embed files with import expression

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/18) Dec 05 2019 That is a nice alternative for long text, but when building

mipri (2/12) Dec 05 2019 Python doesn't have delimited strings.

Kagamin (4/7) Dec 04 2019 Alternative can be any other type of string or an import

Exil (16/56) Dec 03 2019 C++ removed features that were almost never used. So much so I

H. S. Teoh (47/83) Dec 03 2019 Agreed, but that can't be the only criterion for removing a feature. By

Elronnd (11/27) Dec 03 2019 That's clearly not a fair comparison. Heredocs can be reduced to

H. S. Teoh (17/28) Dec 03 2019 This is a valid consideration *before* the language is implemented. The

WebFreak001 (4/18) Dec 03 2019 actually with textmate based grammars this is pretty easy to

H. S. Teoh (41/49) Dec 03 2019 [...]

Paul Backus (8/23) Dec 03 2019 By definition, a context-free grammar is defined in terms of a

H. S. Teoh (20/29) Dec 03 2019 [...]

Andrei Alexandrescu (6/8) Dec 03 2019 I feared that would happen. When I drafted the initial answer, I had

H. S. Teoh (8/17) Dec 03 2019 Yes, sigh, I can see it already: this thread is going to be another of

Walter Bright (3/7) Dec 06 2019 It's a well-known effect that the less technical a proposal is, the more...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (9/14) Dec 03 2019 Just change the syntax to q"delimiter .... retimiled" and I

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/11) Dec 03 2019 That was a joke! Don't argue it...

Dennis (43/53) Dec 03 2019 I don't think you use the same terminology as the DIP so I might

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/4) Dec 03 2019 Can't you use a lexer with a PEG parser?
H. S. Teoh (100/142) Dec 03 2019 Walter has admitted that having 3 encodings, with the corresponding 3

mipri (5/7) Dec 03 2019 Python actually doesn't have HERE docs. When it's included in
Adam D. Ruppe (20/26) Dec 03 2019 VERY useful and helps make D on Windows feel first class, so it
Dennis (56/87) Dec 04 2019 https://rosettacode.org/wiki/Here_document#Python

Timon Gehr (10/17) Dec 04 2019 A small fix for this small problem is to just say in the specification

Walter Bright (10/27) Dec 04 2019 Another case of my lack of academic CS training showing. I would appreci...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/20) Dec 04 2019 I don't think a spec has to use a lot of CS terms, probably

Adam D. Ruppe (4/6) Dec 04 2019 In that context, if you replace "covariant with" with "can act as

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Dec 04 2019 That is much easier to understand, for sure. I think the best

mipri (65/88) Dec 04 2019 The big (and only) advantage of HERE docs is that you so rarely

Kagamin (35/38) Dec 04 2019 In a compiler.

Guillaume Piolat (7/11) Dec 03 2019 YES
Jonathan M Davis (15/32) Dec 03 2019 There are definitely people who use token strings in their code when wri...

Adam D. Ruppe (4/6) Dec 03 2019 Token strings are q{ }, this is about the delimited strings like
Dennis (12/16) Dec 03 2019 I don't propose deprecating token strings, only the identifier

Jonathan M Davis (8/24) Dec 03 2019 Ah. Clearly, I glanced over it all too quickly. I confess that that
H. S. Teoh (28/40) Dec 03 2019 The problem is that token strings require the contents to be *D tokens*.

Elronnd (4/8) Dec 03 2019 Bracket-delimited string (q"[text]", allowing <>, [], (), and {}

H. S. Teoh (6/14) Dec 03 2019 They still need to nest properly, though. Generating BF snippets, for

Kagamin (9/13) Dec 04 2019 It requires efficient memory management. Wait, it requires memory

Les De Ridder (4/11) Dec 03 2019 This DIP explicitly doesn't deprecate token strings, only

aliak (7/23) Dec 03 2019 1) Are there any examples of strings that don't have an in-source

Dennis (25/29) Dec 03 2019 Considering escape sequences such as "\x0B" and string

Arun Chandrasekaran (3/7) Dec 03 2019 We use this feature. We can fix the code, but the DIP doesn't
Walter Bright (2/2) Dec 04 2019 There are a lot of DIPs in the pipeline, and this looks highly unlikely ...

Mike Parker <aldacron gmail.com> writes:

This is the feedback thread for the first round of Community 
Review for DIP 1026, "Deprecate Context-Sensitive String 
Literals":

https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

All review-related feedback on and discussion of the DIP should 
occur in this thread. The review period will end at 11:59 PM ET 
on December 17, or when I make a post declaring it complete.

At the end of Round 1, if further review is deemed necessary, the 
DIP will be scheduled for another round of Community Review. 
Otherwise, it will be queued for the Final Review and Formal 
Assessment.

Anyone intending to post feedback in this thread is expected to 
be familiar with the reviewer guidelines:

https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

*Please stay on topic!*

Thanks in advance to all who participate.

Dec 03 2019

Andrea Fontana <nospam example.com> writes:

On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":

 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

 All review-related feedback on and discussion of the DIP should 
 occur in this thread. The review period will end at 11:59 PM ET 
 on December 17, or when I make a post declaring it complete.

 At the end of Round 1, if further review is deemed necessary, 
 the DIP will be scheduled for another round of Community 
 Review. Otherwise, it will be queued for the Final Review and 
 Formal Assessment.

 Anyone intending to post feedback in this thread is expected to 
 be familiar with the reviewer guidelines:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 *Please stay on topic!*

 Thanks in advance to all who participate.

I think there's a problem with this analysis.

The package registry contains a lot of libraries and just a few 
other projects.
I wonder if libraries represent the real usage by "final" users.

Maybe those stats should be run over github D projects, at least.

Andrea

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 09:27:37 UTC, Andrea Fontana wrote:
 I think there's a problem with this analysis.

 The package registry contains a lot of libraries and just a few 
 other projects.
 I wonder if libraries represent the real usage by "final" users.

 Maybe those stats should be run over github D projects, at 
 least.

I agree, I definitely want to expand my collection of open source 
D code to be more representative. If I have time I may do this 
before the next review round.

Dec 03 2019

John Colvin <john.loughran.colvin gmail.com> writes:

On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":

 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

 All review-related feedback on and discussion of the DIP should 
 occur in this thread. The review period will end at 11:59 PM ET 
 on December 17, or when I make a post declaring it complete.

 At the end of Round 1, if further review is deemed necessary, 
 the DIP will be scheduled for another round of Community 
 Review. Otherwise, it will be queued for the Final Review and 
 Formal Assessment.

 Anyone intending to post feedback in this thread is expected to 
 be familiar with the reviewer guidelines:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 *Please stay on topic!*

 Thanks in advance to all who participate.

Is there much point being almost context-free? Seems like we 
should know for sure whether there are any other parts of the 
grammar that are context dependent before we use it as 
motivation. The DIP is somewhat vague about whether this has been 
properly established

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 12:24:09 UTC, John Colvin wrote:
 Is there much point being almost context-free? Seems like we 
 should know for sure whether there are any other parts of the 
 grammar that are context dependent before we use it as 
 motivation. The DIP is somewhat vague about whether this has 
 been properly established

I _think_ this is the only thing in the lexical grammar that is 
not context-free but I haven't verified this so I am 
intentionally a bit vague about that.
This DIP is obviously necessary for being context-free, but not 
sufficient per se.

I can spend some time on this if it helps, but didn't want to put 
too much time into this before the first review in case it got 
shut down immediately. (And considering Andrei's post, it is at 
risk of that.)

Dec 03 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/3/19 4:03 AM, Mike Parker wrote:
 This is the feedback thread for the first round of Community Review for 
 DIP 1026, "Deprecate Context-Sensitive String Literals":
 
 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e
9d1/DIPs/DIP1026.md 

This DIP is a non-starter. Here documents are easily and effectively 
handled during lexing and have no impact on the language grammar.

Waste of labor is sadly a common theme in our community. We should have 
a mechanism to direct such investment of work toward productive outcome.

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu 
wrote:
 Waste of labor is sadly a common theme in our community.

I consider this low-hanging fruit: just deprecating a token takes 
little implementation effort, and reduction in language 
complexity is (as far as I know) always welcome for the usual 
reasons:
- less code in dmd
- less specification text
- less didactic material / stuff to learn for new D programmers
- less bug/enhancement reports
- any tool that re-implements some part of the compiler is easier 
to make

In this case, such tools would be syntax highlighters. There are 
lots of syntax highlighting implementations for D, just a few off 
the top off my head:
- GitHub
- Code-d
- Kate
- Atom
- Sublime
- Chroma
- Vim
- Emacs
- Notepad++
- ...

They all tend to use their own domain specific language, and I'm 
pretty sure most of them are not powerful enough to express 
identifier-delimited strings. Here's an example of one if you're 
curious what they look like:

https://github.com/alecthomas/chroma/blob/master/lexers/d/d.go

Notice the:

 // TODO support delimited strings

If we don't want D support in syntax highlighters to be 
half-baked everywhere, keeping the lexical grammar simple is a 
good cause.

I can improve the rationale for this DIP with examples like in 
this post, though if you're absolutely adamant that this is a 
waste of effort then that won't help obviously.

Maybe you don't care about syntax highlighting, but please judge 
this DIP by its own merits and not compared to potential other 
DIPs that you care more about.

Dec 03 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 3 December 2019 at 14:45:31 UTC, Dennis wrote:
 I'm pretty sure most of them are not powerful enough to express 
 identifier-delimited strings.

The identifier ones are trivial, they are a simple regex. Heck, 
my vim syntax highlight file not only supports them, but uses the 
opening as a hint as to what language is embedded:

q"html
    <!-- highlights this as html! -->
";


that said though, I don't love them because they must end on a 
new line, without indentation. But still, it was easy to 
implement.

syn region dHTML keepend matchgroup=string start="q\"html$" 
end="^html\"" contains= html


And the generic fallback for other identifiers of course is just

syn region dDelimString start=+q"\z(.\)+ end=+\z1"+ 
contains= Spell
syn region dHereString  start=+q"\z(\I\i*\)\n+ end=+^\z1"+ 
contains= Spell

vim manages to do it all pretty well....

Dec 03 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:

On 12/3/19 9:45 AM, Dennis wrote:
 On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu wrote:
 Waste of labor is sadly a common theme in our community.

 
 I consider this low-hanging fruit: just deprecating a token takes little 
 implementation effort, and reduction in language complexity is (as far 
 as I know) always welcome

These can never be the primary reasons for removing a feature. One 
doesn't remove a feature because it's easy to remove. One removes a 
feature because there are good reasons to remove it, and as perks we get 
simplification of the language and maybe it's easy to remove.

 In this case, such tools would be syntax highlighters.

The entire narrative of the DIP puts CFG front and center. Reader's 
first thought is, "wait, the author is confused about what a CFG is."

FIRST sentence in the abstract: "D is intended to have a context-free 
grammar..."

FIRST paragraph in the rationale: "Regarding language design, Walter 
Bright has stated: [... CFG stuff ...]"

Even the "Grammar Changes" section should be a give-away: the diff 
proposed is in the LEXICAL definition (https://dlang.org/spec/lex.html), 
not in the GRAMMAR definition (https://dlang.org/spec/grammar.html).

If syntax highlighters are the primary reason for the DIP, it should be 
the primary reason in the DIP. The entire rationale needs to be redone. 
There should be an enumeration of syntax highlighters along with their 
success/failure of implementing heredocs. (Didn't test all but far as I 
can tell I've never heard of difficulties with implementing heredocs for 
bash, perl and the like.)

 Maybe you don't care about syntax highlighting, but please judge this 
 DIP by its own merits and not compared to potential other DIPs that you 
 care more about.

A DIP ought to be judged by reading the DIP. This DIP is ill informed 
because it is built around the CFG argument, a non-existing issue. If 
the DIP requires a forum post explaining how it needs to be judged, 
that's a problem with the DIP, not the reader.

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 19:42:12 UTC, Andrei Alexandrescu 
wrote:
 These can never be the primary reasons for removing a feature. 
 One doesn't remove a feature because it's easy to remove. One 
 removes a feature because there are good reasons to remove it, 
 and as perks we get simplification of the language and maybe 
 it's easy to remove.

The DIP mentions:
- D's flagship parser generator Pegged can't express the D 
grammar (without user defined parser functions)
- Syntax highlighters such as the one on Rosetta code have 
trouble with it
- there is precedent of deprecating hexstring literals

I'll admit that the rationale section is not clear in the 
"primary reasons" to remove it, but I considered reducing 
language complexity an obvious win.

Every feature is a trade off between what it brings to the table 
and what it costs, and when it turns out the benefit of a feature 
is low it gets removed, even when it's not inherently 
problematic. That's what happened with .sort, .reverse, Floating 
point NCEG operators, octal literals, hexstring literals, escape 
string literals.

Please answer this: Do you think there were good reasons to 
deprecate hexstring literals, or do you consider that a mistake / 
unnecessary?

 FIRST paragraph in the rationale: "Regarding language design, 
 Walter Bright has stated: [... CFG stuff ...]"

 Even the "Grammar Changes" section should be a give-away: the 
 diff proposed is in the LEXICAL definition 
 (https://dlang.org/spec/lex.html), not in the GRAMMAR 
 definition (https://dlang.org/spec/grammar.html).

And the very first thing on the grammar page is:

 3.1 Lexical Syntax

With a link to the lexical grammar page. I consider lexical 
grammar part of "the grammar of D", even when the lexer and 
parser are separate stages in the compiler. You might say Walter 
was exclusively talking about parsing grammar and not lexing 
grammar, but considering this part of the quote:

 A context free grammar, besides making things a lot simpler, 
 means that IDEs can do syntax highlighting without integrating 
 in most of a compiler front end

It mentions syntax highlighting which does not require parsing.

 If syntax highlighters are the primary reason for the DIP, it 
 should be the primary reason in the DIP.

I don't want to commit to it as 'the primary reason', but I will 
put more emphasis on it in the next iteration.

 If the DIP requires a forum post explaining how it needs to be 
 judged, that's a problem with the DIP, not the reader.

Your first reply came across as "this is useless, please work on 
something else".
That felt like a destructive comment. This reply actually has 
constructive feedback, which helps. Thanks for that.

I will be more specific when talking about 'the grammar', give 
some more focus on syntax highlighters and maybe dive more into 
the precedent of reducing language complexity by removing 
features.

Dec 03 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:

On 12/3/19 3:51 PM, Dennis wrote:
 Please answer this: Do you think there were good reasons to deprecate 
 hexstring literals, or do you consider that a mistake / unnecessary?

It was great primarily because it was a built-in feature made 
unnecessary by improvements to the language.

It would be a mistake to presuppose that hex string literals are a good 
precedent, however. Heredocs have no library alternative. The DIP would 
not be helped by attempting a parallel.

 Your first reply came across as "this is useless, please work on
 something else". That felt like a destructive comment. This reply
 actually has constructive feedback, which helps. Thanks for that.
 
 I will be more specific when talking about 'the grammar', give some
 more focus on syntax highlighters and maybe dive more into the
 precedent of reducing language complexity by removing features.

The destructive comment was actually more useful than one that prompts 
improvements to this DIP. Even if executed to perfection the impact 
would be null.

Let me ask this question: what would be a nice way to convey "this is 
useless, please work on something else"?

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 21:11:49 UTC, Andrei Alexandrescu 
wrote:
 Let me ask this question: what would be a nice way to convey 
 "this is useless, please work on something else"?

If you truly wanted to convey that, you did a good job. But I do 
wonder how you expected me to take that. I would not reply "Got 
it, be right back, I'll e-mail Mike immediately and cancel this 
DIP and terminate all my effort so far right here.". Not after 
three comments in review round 1.

Even if this DIP is a failure, we could at least try to salvage 
some lessons from it. Why is it a bad DIP? What criteria should a 
language feature have to be candidate for removal, and why don't 
context-sensitive string literals fit those criteria? What 
sources of language complexity can be removed instead?

Dec 03 2019

mipri <mipri minimaltype.com> writes:

On Tuesday, 3 December 2019 at 22:11:22 UTC, Dennis wrote:
 Even if this DIP is a failure, we could at least try to salvage
 some lessons from it. Why is it a bad DIP?

Bad motivation and bad construction. The bad construction is
apparently that HERE docs do not actually conflict with context
free grammars and that the entire point of the DIP is moot.
That wasn't obvious to me; I was mainly thinking "I guess it's
assumed that dmd will compile faster with this?"

I think the bad motivation is more interesting, even though a
lot of this is how I received your DIP rather than how you may
have definitely meant it:

1. "Less is always better." Not stated in the DIP, but in your
defense of it here:

   reduction in language complexity is (as far as I know) always 
welcome for the usual reasons:
   - less code in dmd
   - less specification text
   - less didactic material / stuff to learn for new D programmers
   - less bug/enhancement reports
   - any tool that re-implements some part of the compiler is 
easier to make

Less should have a *point*, though. Much code, specification,
and most importantly didactic material is already written. I
have physical bound books within arms reach of me that discuss
these features. Removing the feature doesn't make these books
easier to write, it just makes it more annoying for people to
read them, as they're introduced to deprecated features. It
makes "other people's code" slightly more annoying to consider,
as you may have to update that code to remove since-deprecated
features.

Removing HERE docs doesn't create a python2/python3 or a
perl5/perl6 situation, but it still forks the language and the
old language still does not simply or automatically disappear.
I really dislike this about C++: that no matter how modern it
gets, there will be these huge carbon-dated layers of code out
there that are pre-modern and that can hardly be understood
without also learning the stuff that the modern features are
supposed to have replaced.

If a feature were to be judged a mistake, it can still be a
mistake to remove the feature later on. Less is not always
better.

2. D's problem is "too many features" -> let's remove any

that looks relatively easy to remove.

How much agreement do you think there is on the first point?

Consider the "remove ~= from arrays" DIP. It removed a
feature, and removing the feature arguably materially improved
D's options to evolve as a language, and it got a really
incensed negative response.

A human engineer can improve a machine by shutting it down,
tearing it apart, making an improvement, and putting it back
together again. This interruptability of the engineered system
is one of the characteristics of human engineering, along with
"use dry materials" and "use stiff materials", that
distinguishes it from what you might call engineering by Mother
Nature, who uses wet materials, and flexible materials, and
whose works (even if they pull some tricks like molting or
entering a cocoon) must continue to stay alive even as they
undergo radical changes in form.

A DIP can't kill D, take it apart, make an improvement, and
then put it back together again, because then all the users
will be gone. Language design is more like natural engineering
in this way.

If part of D's problems is that it has a lot of features, the
best way forward can still not be to remove them.

3. "Walter said a thing about D, but a StackOverflow comment
refuted that, so the language should change so that this
criticism is no longer true."

https://stackoverflow.com/a/7083615

Geez. Someone who thinks D has "an obnoxious amount of ambiguity"
is definitely still going to think that after HERE docs are gone.

Dec 03 2019

FeepingCreature <feepingcreature gmail.com> writes:

On Tuesday, 3 December 2019 at 23:35:21 UTC, mipri wrote:
 2. D's problem is "too many features" -> let's remove any

 that looks relatively easy to remove.

 How much agreement do you think there is on the first point?

 Consider the "remove ~= from arrays" DIP. It removed a
 feature, and removing the feature arguably materially improved
 D's options to evolve as a language, and it got a really
 incensed negative response.

I think this is a really questionable argument, because it 
implicitly presumes that all features are worth the same. The 
"remove ~= from arrays" DIP got, as far as I could see, basically 
no feedback along the lines of "whatever, we use it but we could 
replace it easily" or "I think D doesn't need to reduce its 
feature set in general." The feedback it got was, as far as I 
could tell, overwhelmingly "this feature is a core component of 
the usefulness of the D language and definitely the *wrong place* 
to start removing things."

Logically speaking, the more people think it is the wrong place 
to start removing features, the less that debate says about 
removing features as a whole, because people were more motivated 
by the specific feature rather than the general state of the 
language.

Dec 04 2019

Dennis <dkorpel gmail.com> writes:

Thanks for your detailed breakdown.

On Tuesday, 3 December 2019 at 23:35:21 UTC, mipri wrote:
 It makes "other people's code" slightly more annoying to 
 consider,
 as you may have to update that code to remove since-deprecated
 features.

That's the nature of deprecation: a short term cost for a long 
term improvement.

 If a feature were to be judged a mistake, it can still be a
 mistake to remove the feature later on. Less is not always
 better.

That's true.

 2. D's problem is "too many features" -> let's remove any

 that looks relatively easy to remove.

 How much agreement do you think there is on the first point?

I don't know how much explicit agreement there is to the 
sentiment that D has too many features, but I do know at least 
Walter is always interested in reducing language complexity, and 
many non-actionable complaints of users (such as "D is difficult 
too learn") are rooted in things like this.

 3. "Walter said a thing about D, but a StackOverflow comment
 refuted that, so the language should change so that this
 criticism is no longer true."

That is only there for the narrative / background, correcting 
criticism is not a goal of this DIP.

Dec 04 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 4 December 2019 at 09:42:32 UTC, Dennis wrote:
 That is only there for the narrative / background, correcting 
 criticism is not a goal of this DIP.

Suggesting a workable alternative usually is easier. Like:

replace: q"delimiter...

with Python like: """

Dec 04 2019

mipri <mipri minimaltype.com> writes:

On Wednesday, 4 December 2019 at 10:10:09 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 4 December 2019 at 09:42:32 UTC, Dennis wrote:
 That is only there for the narrative / background, correcting 
 criticism is not a goal of this DIP.

 Suggesting a workable alternative usually is easier. Like:

 replace: q"delimiter...

 with Python like: """

Or specify that q"<<< (three chars exactly) can only be matched
with >>>", along with the other matching delimiters. This is a
breaking change though since the current behavior is:

   $ rdmd --eval 'writeln(q"<<< hello >>>")'
   << hello >>

Dec 04 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 12/3/19 5:11 PM, Dennis wrote:
 What criteria should a language feature have to be candidate for 
 removal, and why don't context-sensitive string literals fit those 
 criteria? What sources of language complexity can be removed instead?

That got me thinking. Here's what I'd opine.

A good DIP creates a scientific argument. It would have the general 
attitude of building, through a series of factual statements, a 
hypothesis that is convincing. A neutral person with the proper 
background would read the facts and reach the conclusion as much as the 
author. (In contrast, a DIP that is not scientific would attempt to use 
qualitative arguments and rhetoric in an attempt to create an opinion 
trend.)

Consider someone reads a DIP proposing the removal of here docs 
containing facts such as these:

* "We have analyzed x languages and of these, we found y historical 
issues related to mistaken or poor performance implementation of 
heredocs. [... details ...]"

* "Across x editors, we discovered that x1 do not implement here docs 
for any of their supported languages, x2 do not implement them for D, 
and x3 implement them with severe performance bottlenecks. [... details 
...]"

" "In the D compiler issue, we found x bug reports issued over y years. 
They took z days on average to fix. x1 issues are still open. [... 
details ...]"

* "The code dedicated to heredocs in the D reference parser is y lines 
long, which constitutes z% of the entire lexer. Lexing of heredocs is t% 
slower than any other equivalent strings, revealing a serious 
performance bottleneck. [... details ...]"

With such arguments at hand, a proposal would build a powerful argument 
that anyone can easily verify and take into consideration. No need for 
argumentation, explanations, etc. Conversely, if one does such an 
investigation and gets no meaningful results, the conclusion that 
heredocs are okay as they are would also be immediate.

Now it may be argued that all of this is hard work, and of high risk - 
even if the DIP is well-argued, it could be rejected. Also, is the 
result of the work (a small language simplification) worth the effort?

Sadly I know of no solution to this. What I can say is that it's the 
main dilemma tormenting graduate students doing research. A colleague of 
mine in the PhD program said he has any number of ideas to research, but 
the cognitive load of putting work into something that may not pan out 
is paralyzing him, so he ends up doing nothing for long periods of time. 
He ended up not finishing his degree. For all I know he was smarter and 
better than many who did graduate.

Dec 04 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 4 December 2019 at 21:57:00 UTC, Andrei 
Alexandrescu wrote:
 A good DIP creates a scientific argument. It would have the 
 general attitude of building, through a series of factual 
 statements, a hypothesis that is convincing. A neutral person 
 with the proper background would read the facts and reach the 
 conclusion as much as the author. (In contrast, a DIP that is 
 not scientific would attempt to use qualitative arguments and 
 rhetoric in an attempt to create an opinion trend.)

That will prevent qualitative incremental improvements. You 
cannot make quantitative arguments without very large amounts of 
data... there is no such dataset, only github.

If the DIP had provided an argument for an alternative 
here-document syntax that was easier to parse then it is probable 
that there would have been few objections to it. It could have 
been automated.

There is really no use in pretending that language changes are 
apolitical. They are usually inherently political.

Dec 04 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 12/3/2019 2:11 PM, Dennis wrote:
 Why is it a bad DIP?

I think Andrei covered that fairly well.

 What criteria should a language feature have to be 
 candidate for removal,

This would be a good opening for a separate thread.

 and why don't context-sensitive string literals fit those 
 criteria?

The only real cost identified is poor support for syntax highlighting in some 
text editors. On the other hand, heredocs are a common language feature, and 
other methods of doing it are so clumsy people rarely have the stomach to do it.

 What sources of language complexity can be removed instead?

This would be a good opening for a separate thread.

Dec 04 2019

Kagamin <spam here.lot> writes:

On Wednesday, 4 December 2019 at 22:37:24 UTC, Walter Bright 
wrote:
 The only real cost identified is poor support for syntax 
 highlighting in some text editors. On the other hand, heredocs 
 are a common language feature, and other methods of doing it 
 are so clumsy people rarely have the stomach to do it.

If those other literals are bad. For python it's the opposite: 
given triple quoted strings people can't stand delimited strings 
and use triple quoted strings predominantly instead of delimited 
strings, see it in action: 
https://github.com/django/django/blob/master/django/core/signing.py - it's the
first random python code I found on github.

Dec 05 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Thursday, 5 December 2019 at 18:07:08 UTC, Kagamin wrote:
 If those other literals are bad. For python it's the opposite: 
 given triple quoted strings people can't stand delimited 
 strings and use triple quoted strings predominantly instead of 
 delimited strings, see it in action: 
 https://github.com/django/django/blob/master/django/core/signing.py - it's the
first random python code I found on github.

Yes, that usage you link to was for docs-strings though (more 
like comments), but I use Python triple quoted strings all the 
time. I have never really run into a situation where there was a 
clash with """, actually. Looks like a too simple solution, but 
works very well in practice.

Another point is that here-documents may be important in 
WebAssembly for embedding "files".

Dec 05 2019

Kagamin <spam here.lot> writes:

On Thursday, 5 December 2019 at 18:23:10 UTC, Ola Fosheim Grøstad 
wrote:
 Another point is that here-documents may be important in 
 WebAssembly for embedding "files".

D can embed files with import expression 
https://dlang.org/spec/expression.html#import_expressions

Dec 05 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Thursday, 5 December 2019 at 18:34:12 UTC, Kagamin wrote:
 On Thursday, 5 December 2019 at 18:23:10 UTC, Ola Fosheim 
 Grøstad wrote:
 Another point is that here-documents may be important in 
 WebAssembly for embedding "files".

 D can embed files with import expression 
 https://dlang.org/spec/expression.html#import_expressions

That is a nice alternative for long text, but when building 
websites you often deal with many shorter blocks of text.

Anyway. Although I prefer """ as it is visually cleaner, C++ 
actually has something similar to D:

const char* s1 = R"foo(
Hello
World
)foo";

https://en.cppreference.com/w/cpp/language/string_literal

So, highlighters need to support that if they want to support 
C++...

Dec 05 2019

mipri <mipri minimaltype.com> writes:

On Thursday, 5 December 2019 at 18:07:08 UTC, Kagamin wrote:
 On Wednesday, 4 December 2019 at 22:37:24 UTC, Walter Bright
 wrote:
 The only real cost identified is poor support for syntax 
 highlighting in some text editors. On the other hand, heredocs
 are a common language feature, and other methods of doing it
 are so clumsy people rarely have the stomach to do it.

 If those other literals are bad. For python it's the opposite: 
 given triple quoted strings people can't stand delimited 
 strings and use triple quoted strings predominantly instead of 
 delimited strings, see it in action:

Python doesn't have delimited strings.

Dec 05 2019

Kagamin <spam here.lot> writes:

On Tuesday, 3 December 2019 at 21:11:49 UTC, Andrei Alexandrescu 
wrote:
 It would be a mistake to presuppose that hex string literals 
 are a good precedent, however. Heredocs have no library 
 alternative.

Alternative can be any other type of string or an import 
expression.

Dec 04 2019

Exil <Exil gmall.com> writes:

On Tuesday, 3 December 2019 at 19:42:12 UTC, Andrei Alexandrescu 
wrote:
 On 12/3/19 9:45 AM, Dennis wrote:
 On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei 
 Alexandrescu wrote:
 Waste of labor is sadly a common theme in our community.

 
 I consider this low-hanging fruit: just deprecating a token 
 takes little implementation effort, and reduction in language 
 complexity is (as far as I know) always welcome

 These can never be the primary reasons for removing a feature. 
 One doesn't remove a feature because it's easy to remove. One 
 removes a feature because there are good reasons to remove it, 
 and as perks we get simplification of the language and maybe 
 it's easy to remove.

C++ removed features that were almost never used. So much so I 
don't even remember what they were called. This is a D feature I 
never knew existed. It does make it simpler and I'd argue for 
removing it entirely rather than adding replacements for it.

 In this case, such tools would be syntax highlighters.

 The entire narrative of the DIP puts CFG front and center. 
 Reader's first thought is, "wait, the author is confused about 
 what a CFG is."

 FIRST sentence in the abstract: "D is intended to have a 
 context-free grammar..."

 FIRST paragraph in the rationale: "Regarding language design, 
 Walter Bright has stated: [... CFG stuff ...]"

 Even the "Grammar Changes" section should be a give-away: the 
 diff proposed is in the LEXICAL definition 
 (https://dlang.org/spec/lex.html), not in the GRAMMAR 
 definition (https://dlang.org/spec/grammar.html).

 If syntax highlighters are the primary reason for the DIP, it 
 should be the primary reason in the DIP. The entire rationale 
 needs to be redone. There should be an enumeration of syntax 
 highlighters along with their success/failure of implementing 
 heredocs. (Didn't test all but far as I can tell I've never 
 heard of difficulties with implementing heredocs for bash, perl 
 and the like.)

The tools for IDEs, I'd argue auto complete is probably the most 
useful tool an IDE has. You can't implement it without basically 
having the entire front end of the compiler because of CTFE. Its 
so complicated in fact that there are no tools for D that 
support. Ice seen some incorrect syntax highlighting for D but I 
think it was specifically cause by q{} which this doesn't remove 
anyways.

 Maybe you don't care about syntax highlighting, but please 
 judge this DIP by its own merits and not compared to potential 
 other DIPs that you care more about.

 A DIP ought to be judged by reading the DIP. This DIP is ill 
 informed because it is built around the CFG argument, a 
 non-existing issue. If the DIP requires a forum post explaining 
 how it needs to be judged, that's a problem with the DIP, not 
 the reader.

DIP1021. If the D federation leadership holds itself to that kind 
of standard, I don't see why anyone should expect them to hold 
someone else to a standard above and beyond their own.

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 02:45:31PM +0000, Dennis via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu wrote:
 Waste of labor is sadly a common theme in our community.


That's a bit uncalled for.


 I consider this low-hanging fruit: just deprecating a token takes little
 implementation effort, and reduction in language complexity is (as far as I
 know) always welcome for the usual reasons:
 - less code in dmd
 - less specification text
 - less didactic material / stuff to learn for new D programmers
 - less bug/enhancement reports
 - any tool that re-implements some part of the compiler is easier to make

Agreed, but that can't be the only criterion for removing a feature. By
the same argument, one could make the case for removing templates from
D. Bingo, the language instantly becomes so much easier to parse! And it
greatly simplifies the compiler -- we can delete large sections of it,
in fact! The spec becomes simpler, D newbies don't need to learn this
hard template stuff anymore, and we can close all template-relateed
bugs, and tools become greatly simplified.


 In this case, such tools would be syntax highlighters. There are lots
 of syntax highlighting implementations for D, just a few off the top
 off my head:
 - GitHub
 - Code-d
 - Kate
 - Atom
 - Sublime
 - Chroma
 - Vim
 - Emacs
 - Notepad++
 - ...
 
 They all tend to use their own domain specific language, and I'm
 pretty sure most of them are not powerful enough to express
 identifier-delimited strings.

Are you sure? Adam just gave an example of correct heredoc highlighting
in vim.  It may not be *trivial*, but it's possible.  And users don't
have to worry about it, somebody writes the snippet once for all, and
everyone else can just reuse it.


[...]
 If we don't want D support in syntax highlighters to be half-baked
 everywhere, keeping the lexical grammar simple is a good cause.

IOW, implementators aren't competent enough to implement something up to
spec, therefore we should dumb down the spec for their sake? Sounds like
a backwards reason for doing something.


 I can improve the rationale for this DIP with examples like in this
 post, though if you're absolutely adamant that this is a waste of
 effort then that won't help obviously.
 
 Maybe you don't care about syntax highlighting, but please judge this
 DIP by its own merits and not compared to potential other DIPs that
 you care more about.

The problem with this DIP is that it removes a marginal feature for no
good rationale, breaking a pretty long list of existing D code projects
that depend on said feature, while offering very little in return
(nothing that can't be fixed another way, e.g., fix broken syntax
highlighters so that they work properly(!)). And it does so without
considering why this feature might have been added in the first place,
what kind of problems it solves, and how said problems can be mitigated
if the feature was removed.

As I've already said, I work a lot with code generators and other code
that embed long-ish text passages in code.  Heredoc syntax is ideal for
this sort of code, allowing you to temporarily "escape" from D syntax
and write code snippets as-is, rather than require onerous escaping
which makes said text less readable. E.g., if I want to embed a mini
Perl script inside a function, I couldn't write it as a token string
(some Perl tokens are not D tokens), and writing it as a quoted string
induces Leaning Toothpick Syndrome, making it hard to edit the script.
The script itself is short enough it doesn't seem worth creating it as a
separate file (and then needing to fight with paths to find it in the
right place).  Heredoc syntax lets me just write the danged script in
situ and move on already, instead of fighting with Leaning Toothpick
Syndrome or heaping on yet another layer of pathname resolution code
just to find a miserable 5-line script file.

Same goes for embedded long-ish text (don't have to type ""~ all over
the place), etc..

It's marginal, yes, but heredocs are quite useful for the use cases they
were intended to be used, and I really don't see why they should be
singled out among so many other things that D could stand to improve in.


T

-- 
If the comments and the code disagree, it's likely that *both* are wrong. --
Christopher

Dec 03 2019

Elronnd <elronnd elronnd.net> writes:

On Tuesday, 3 December 2019 at 20:53:07 UTC, H. S. Teoh wrote:
 *snipped various arguments to do with simplicity*

 Agreed, but that can't be the only criterion for removing a 
 feature. By the same argument, one could make the case for 
 removing templates from D. Bingo, the language instantly 
 becomes so much easier to parse! And it greatly simplifies the 
 compiler -- we can delete large sections of it, in fact! The 
 spec becomes simpler, D newbies don't need to learn this hard 
 template stuff anymore, and we can close all template-relateed 
 bugs, and tools become greatly simplified.

That's clearly not a fair comparison.  Heredocs can be reduced to 
a set of local transformations, while templates cannot.  This 
means: code using heredocs can be mechanically changed to not use 
them, and heredocs do not make the language more expressive.

 If we don't want D support in syntax highlighters to be 
 half-baked everywhere, keeping the lexical grammar simple is a 
 good cause.

 IOW, implementators aren't competent enough to implement 
 something up to spec, therefore we should dumb down the spec 
 for their sake? Sounds like a backwards reason for doing 
 something.

The easier the language is to implement, the more implementors 
there will be.  If there are compelling reasons to include a 
language feature, and it makes implementation more difficult, it 
should be included regardless.  But that doesn't mean that ease 
of implementation should be completely ignored when considering 
language features.

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 09:38:28PM +0000, Elronnd via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 20:53:07 UTC, H. S. Teoh wrote:

[...]
 IOW, implementators aren't competent enough to implement something
 up to spec, therefore we should dumb down the spec for their sake?
 Sounds like a backwards reason for doing something.

 
 The easier the language is to implement, the more implementors there
 will be.  If there are compelling reasons to include a language
 feature, and it makes implementation more difficult, it should be
 included regardless.  But that doesn't mean that ease of
 implementation should be completely ignored when considering language
 features.

This is a valid consideration *before* the language is implemented. The
current situation is:

1) Heredocs are *already* implemented, have been for a long time, and
working very well, except with the wrinkle of some poor syntax
highlighter implementations that fail to parse them correctly.

2) Parsing heredocs is actually not *that* hard, as proven by already
(at least) two examples given in this very thread of syntax highlighting
code that actually parses them correctly. We aren't talking about
solving NP complete problems here, that might be considered reasonable
cause for simplifying something.

It does not take a day's work to write a parser that understands
heredocs, and we're debating about implementation *difficulty*? Whoa.


T

-- 
My program has no bugs! Only undocumented features...

Dec 03 2019

WebFreak001 <d.forum webfreak.org> writes:

On Tuesday, 3 December 2019 at 14:45:31 UTC, Dennis wrote:
 On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei 
 Alexandrescu wrote:
 [...]

 I consider this low-hanging fruit: just deprecating a token 
 takes little implementation effort, and reduction in language 
 complexity is (as far as I know) always welcome for the usual 
 reasons:
 - less code in dmd
 - less specification text
 - less didactic material / stuff to learn for new D programmers
 - less bug/enhancement reports
 - any tool that re-implements some part of the compiler is 
 easier to make

 [...]

actually with textmate based grammars this is pretty easy to 
implement: 
https://github.com/Pure-D/code-d/blob/master/syntaxes/d.json#L2190-L2200

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 07:38:29AM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
On 12/3/19 4:03 AM, Mike Parker wrote:
This is the feedback thread for the first round of Community Review
for DIP 1026, "Deprecate Context-Sensitive String Literals":

https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

This DIP is a non-starter. Here documents are easily and effectively
handled during lexing and have no impact on the language grammar.

[...]

When I read the title "context-sensitive string literals" I was
wondering what part of D actually has strings whose interpretation
changes depending on context. I was shocked to discover that it was
referring to heredoc strings.

Please don't get rid of heredoc strings. I use them quite a bit, because
I work a lot with code generators. They are a refreshing change from
C/C++ where trying to quote a piece of code as a string requires Leaning
Toothpick Syndrome (i.e., \'s all over the place to escape quoted string
metacharacters). I do *not* want to return to that nastiness, thank you
very much.

As Andrei said, heredoc string are trivial to parse because they are
essentially a single big token. This should not pose any problem for
the parser at all. The argument in the DIP is flawed because, at the
level of a lexer/parser, a heredoc string is no different from a
delimited string: it starts with a sequence of one or more characters
(the opening delimiter), spans some arbitrary number of characters (the
string content) until another sequence of one or more characters (the
closing delimiter). Nothing stops someone from writing a
50,000-character double-quoted string, for example, and the lexer/parser
will handle it just fine. So why the hate against heredoc strings?
Arguably, heredoc strings are exactly what *solves* the problem of
50,000-character strings being essentially unreadable to a human reader
because of poor formatting.

As for poor syntax highlighting as mentioned in the DIP, how is that
even a problem with the language?! It's a strawman argument based on
skewed data obtained from badly-written lexers that don't actually lex D
code correctly. It should be the syntax highlighter that should be
fixed, rather than deprecate an actually useful feature in the language.

Not to mention, the long list of projects at the end that will need to
be updated, which includes dmd itself BTW, looks like strong evidence of
good use of such string literals, rather than marginal use that might be
construed to be a reason for deprecation.

And most importantly of all: string literals are *single tokens* in the
language. They are lexical units, and therefore have nothing whatsoever
to do with the grammar being context-free or not. We're shooting at the
wrong target here.

--
Famous last words: I wonder what will happen if I do *this*...

Dec 03 2019

Paul Backus <snarwin gmail.com> writes:

On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
 On Tue, Dec 03, 2019 at 07:38:29AM -0500, Andrei Alexandrescu 
 via Digitalmars-d wrote:
 On 12/3/19 4:03 AM, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":
 
 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

 
 This DIP is a non-starter. Here documents are easily and 
 effectively handled during lexing and have no impact on the 
 language grammar.


[...]
 As Andrei said, heredoc string are trivial to parse because 
 they are essentially a single big token.  This should not pose 
 any problem for the parser at all.

By definition, a context-free grammar is defined in terms of a 
finite set of non-terminal symbols (i.e., tokens). [1] The set of 
all string literals is infinite. Therefore, either string 
literals are not tokens, or D's grammar is not context-free.

[1] 
https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 08:40:14PM +0000, Paul Backus via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:

[...]
 As Andrei said, heredoc string are trivial to parse because they are
 essentially a single big token.  This should not pose any problem
 for the parser at all.

 
 By definition, a context-free grammar is defined in terms of a finite
 set of non-terminal symbols (i.e., tokens). [1] The set of all string
 literals is infinite. Therefore, either string literals are not
 tokens, or D's grammar is not context-free.

[...]

I think you're imposing a needlessly literal(!) interpretation of
context-free grammars.  For example, integer literals are also unbounded
(there is no largest integer, therefore the set of integer literals is
infinite). Does that mean that a calculator program that includes
integer literals in its grammar is not context-free?  I think that's a
preposterous application of the definitions.

As far as the grammar is concerned, all integer literals are the same
terminal symbol, because the grammar does not (need to) distinguish
between them.

Treating string (or any other) literals as non-tokens makes no sense
because they are not symmetric with non-string (or other) tokens, e.g.,
D tokens allow arbitrary whitespace between them, yet you cannot
arbitrarily insert whitespace into a string literal without changing its
semantics.


T

-- 
Time flies like an arrow. Fruit flies like a banana.

Dec 03 2019

Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:

On 12/3/19 4:04 PM, H. S. Teoh wrote:
 I think you're imposing a needlessly literal(!) interpretation of
 context-free grammars.

I feared that would happen. When I drafted the initial answer, I had 
this text: "Subject to the way the grammar is defined across lexical 
tokens and higher-level constructs, yes, one could build a theoretical 
argument that heredocs are a context-dependent construct." Then I 
removed it to avoid divagating. Now, here we are.

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 04:14:47PM -0500, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 12/3/19 4:04 PM, H. S. Teoh wrote:
 I think you're imposing a needlessly literal(!) interpretation of
 context-free grammars.

 
 I feared that would happen. When I drafted the initial answer, I had
 this text: "Subject to the way the grammar is defined across lexical
 tokens and higher-level constructs, yes, one could build a theoretical
 argument that heredocs are a context-dependent construct." Then I
 removed it to avoid divagating. Now, here we are.

Yes, sigh, I can see it already: this thread is going to be another of
those interminably-long debates and nitpicking over technicalities, and
at the end of it all, this DIP will fall by the wayside and we will have
accomplished nothing.


T

-- 
Ph.D. = Permanent head Damage

Dec 03 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 12/3/2019 1:27 PM, H. S. Teoh wrote:
 Yes, sigh, I can see it already: this thread is going to be another of
 those interminably-long debates and nitpicking over technicalities, and
 at the end of it all, this DIP will fall by the wayside and we will have
 accomplished nothing.

It's a well-known effect that the less technical a proposal is, the more debate 
will follow.

Dec 06 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 3 December 2019 at 21:04:52 UTC, H. S. Teoh wrote:
 Treating string (or any other) literals as non-tokens makes no 
 sense because they are not symmetric with non-string (or other) 
 tokens, e.g., D tokens allow arbitrary whitespace between them, 
 yet you cannot arbitrarily insert whitespace into a string 
 literal without changing its semantics.

Just change the syntax to q"delimiter .... retimiled" and I 
believe it will be context free... IIRC.

So yeah, I agree. CFG is not a the right argument. Never 
understood why people are so enarmoured by them, parsers are far 
more powerful today than they used to be. The human should be the 
important factor when designing syntax, not the parser...

Also, not sure if it is context free if you include comments... 
But I could be wrong, and again I don't think it should matter...

Dec 03 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 3 December 2019 at 21:21:30 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 3 December 2019 at 21:04:52 UTC, H. S. Teoh wrote:
 Treating string (or any other) literals as non-tokens makes no 
 sense because they are not symmetric with non-string (or 
 other) tokens, e.g., D tokens allow arbitrary whitespace 
 between them, yet you cannot arbitrarily insert whitespace 
 into a string literal without changing its semantics.

 Just change the syntax to q"delimiter .... retimiled" and I 
 believe it will be context free... IIRC.

That was a joke! Don't argue it...

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:
 So why the hate against heredoc strings?

I don't think you use the same terminology as the DIP so I might 
misinterpret this, but I have nothing against here documents. I'm 
glad D provides plenty of useful string literals for including 
text in source code, it's just that some of them are rarely used 
and bump up the complexity class of D's lexical grammar.

D has 6 types of string literals ("double quote" `back tick` r"r 
string" q{tokens} 	q"<brackets>" q”EOS ident EOS”) with 3 
encoding options (char, wchar, dchar).

There is a DIP for adding interpolated strings to D.

People are mentioning how D keeps adding adding features and is 
on a road towards C++ complexity. There is precedent for removing 
barely used features (see e.g. octal, escape or hexstring 
literals  on https://dlang.org/deprecate.html).

And of course there are always users that remorse the removal of 
their favorite feature, but in the long run everyone benefits 
from a simpler language.

As for your use case of code generation, I'm having trouble 
relating to it. I happened to write some code generation 
algorithms myself recently, and could do fine with q{} strings 
for large templates and regular "" or `` string for small token 
parts like "switch(".

- Do you truly have 50,000 character string literals in your code 
base?
- Can't you use bracket delimited strings instead, q"<like this?>"
- If accidental early termination in huge string literals is a 
concern, even an identifier-delimited string isn't always safe. 
Can't you use an `import()` statement on an external text file?
- If those 50,000 characters are code and you value readability 
of it, isn't it a problem that there is no syntax highlighting in 
a q"EOS EOS" string?
- Can you maybe post an example of some of your q"EOS EOS" 
strings used for code generation?

 As for poor syntax highlighting as mentioned in the DIP, how is 
 that even a problem with the language?! It's a strawman 
 argument based on skewed data obtained from badly-written 
 lexers that don't actually lex D code correctly. It should be 
 the syntax highlighter that should be fixed, rather than 
 deprecate an actually useful feature in the language.

The thing is, these string literals simply can't be expressed in 
e.g. a PEG grammar. The D's grammar is one complexity class 
higher than needed just for this one relatively obscure string 
literal. Sure you can say "not our problem, those tooling authors 
just need to account for D's complexity", but I don't think that 
is useful for D's tooling ecosystem.

 Not to mention, the long list of projects at the end that will 
 need to be updated, which includes dmd itself BTW, looks like 
 strong evidence of good use of such string literals

dmd only uses them in the test-suite, same as libdparse.
I can spend some more time in the DIP exploring how other 
packages use them however.

Dec 03 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 3 December 2019 at 21:34:26 UTC, Dennis wrote:
 The thing is, these string literals simply can't be expressed 
 in e.g. a PEG grammar.

Can't you use a lexer with a PEG parser?

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 09:34:26PM +0000, Dennis via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 18:34:22 UTC, H. S. Teoh wrote:

[...]
 D has 6 types of string literals ("double quote" `back tick` r"r
 string" q{tokens} 	q"<brackets>" q”EOS ident EOS”) with 3 encoding
 options (char, wchar, dchar).

Walter has admitted that having 3 encodings, with the corresponding 3
string types, was a "miss" in D's design, and that he should have just
stuck with UTF-8. UTF-16 is occasionally useful for interfacing with
Windows APIs, but that's pretty narrow and contained, and nobody uses
UTF-32 strings in practice.  In practice, I've not seen many examples of
non-UTF-8 strings in D code.

I admit D having 6 types of string literals is excessive, but as
somebody has already said, even if something was a mistake in
retrospect, doesn't necessarily mean that removing it isn't also a
mistake. Because now you have the weight of existing code weighing
against removing it.

And just for a bit more perspective, Python also has heredoc syntax, so
does Perl, PHP, bash, and probably many others. If heredocs were really
such a bad idea, why are people putting them into so many languages,
over and over again? Perhaps, just perhaps, there are use cases for them
that this DIP has overlooked / underrepresented?  I don't hear people
clamoring for removing heredocs from Python, for example, so I'm really
having a hard time understanding why we're having this debacle right
now.



 There is a DIP for adding interpolated strings to D.

That DIP seems dead in the water though. The author has vanished and
nobody has taken up the reins.


 People are mentioning how D keeps adding adding features and is on a
 road towards C++ complexity. There is precedent for removing barely
 used features (see e.g. octal, escape or hexstring literals  on
 https://dlang.org/deprecate.html).

Actually, I was a bit disappointed with the removal of hexstring
literals, but the issue is somewhat more complex. The problem with
hexstring literals was that it was some kind of half-hearted attempt at
supporting literal hexadecimal data, because it coerces the result into
string rather than ubyte[]. The hexstring *syntax* was ideal for
entering hex data, but then having the result coerced into string seemed
to me like a backwards misfit. If it had produced a ubyte[] then there
would have been much more reason to keep it in the language, since
occasionally it's very useful to be able to enter blocks of binary data
in hex.  As to why the original design produced a string rather than a
ubyte[], I can only speculate. Perhaps it was meant as a poor man's way
of writing a Unicode string without a Unicode-aware keyboard / input
method?  Who knows.  In any case, *that* use case is rendered completely
moot by the \u.... and \U........ escape sequences in your regular
double-quoted string.  The ubyte[] use case is arguably implementable in
a CTFE parser the same way octal literals can, and so hexstrings went
the way of the dodo.


 And of course there are always users that remorse the removal of their
 favorite feature, but in the long run everyone benefits from a simpler
 language.
 
 As for your use case of code generation, I'm having trouble relating
 to it.  I happened to write some code generation algorithms myself
 recently, and could do fine with q{} strings for large templates and
 regular "" or `` string for small token parts like "switch(".

q{} works well for emitting *D code*.  Not so well for non-D code.


 - Do you truly have 50,000 character string literals in your code base?

No, but I do have a number of large multi-line string literals that
simply look best / are most maintainable in heredoc format.


 - Can't you use bracket delimited strings instead, q"<like this?>"

Heredoc syntax is better because the ending delimiter is obvious. When
the string literal spans multiple lines, single-character terminating
delimiters just aren't the best way to do it.


 - If accidental early termination in huge string literals is a
 concern, even an identifier-delimited string isn't always safe. Can't
 you use an `import()` statement on an external text file?

Identifier-delimited string is safe because the literal is typed in
directly as code, so you already know beforehand what words might appear
or not appear in it, and you already know what will *never* appear in
the string.  It isn't as though I'm copy-n-pasting arbitrary text from
arbitrary input files into my code just for fun.

String imports require creating an extra file to contain the string, and
requires running the compiler with -J + the right path(s), all of which
are extra hurdles to jump through.  It's the same thing with external
unittests vs. unittest blocks that you can just write inline. It's
*possible*, but inconvenient and liable to go out-of-sync as you modify
the code.


 - If those 50,000 characters are code and you value readability of it,
 isn't it a problem that there is no syntax highlighting in a q"EOS
 EOS" string?

As I said, I don't use a syntax highlighter.  Also, any attempt to
highlight is moot if the string contains code of a different language
(see below for my use cases).


 - Can you maybe post an example of some of your q"EOS EOS" strings
 used for code generation?

I feel a single example will not adequately convey my point. Here's a
list of use cases I use heredocs for (in no particular order):

1) Generating HTML snippets
2) Generating PovRay scene description snippets
3) Generating D code snippets
4) Generating snippets of a DSL I use for generating geometric models
5) Generating boilerplate for input data to an external convex hull
   solver (has its own peculiar syntax)
6) Generating GLSL shader code snippets
7) Generating Java code snippets
8) Command line usage descriptions

Some of this code is somewhat old but is actively used as infrastructure
for my current projects, and having to go back to rewrite heredocs just
because of some ivory tower ideal of "cleaning up useless literals in D"
is rather distasteful to me, you understand, esp. since I don't even use
syntax highlighting in the first place, so this is just pure work for
zero benefit.  If we were still in the early stages of D development,
then sure, go ahead and nuke heredocs if you have very good reasons for
it, but I'm not about to go rewriting code for (1) to (8) now, not when
there's basically zero benefit in doing so.


 As for poor syntax highlighting as mentioned in the DIP, how is that
 even a problem with the language?! It's a strawman argument based on
 skewed data obtained from badly-written lexers that don't actually
 lex D code correctly. It should be the syntax highlighter that
 should be fixed, rather than deprecate an actually useful feature in
 the language.

 
 The thing is, these string literals simply can't be expressed in e.g.
 a PEG grammar.

?!  Can't you just use a custom lexer with your PEG grammar?


 The D's grammar is one complexity class higher than needed just for
 this one relatively obscure string literal. Sure you can say "not our
 problem, those tooling authors just need to account for D's
 complexity", but I don't think that is useful for D's tooling
 ecosystem.

[...]

Then isn't the solution simply to write a self-contained heredoc parsing
function, put it in a dub package, and let everyone reuse it? Then
nobody will have to write it for themselves again. Problem solved.

(If it's even that complex to begin with. As I said, we already have 2
working examples of syntax highlighter code that work fine with
heredocs. It's not as though D invented heredocs; they have been around
since the early days of the Unix shell, and people have been writing
parsing code for it for a long time. Its supposed "complexity" is really
blown out of proportion here.)

This whole debacle feels like heredocs are being singled out as a
scapegoat in a misguided quest to "simplify the language".  Like we're
grasping at straws because we're unable to tackle the bigger issues, so
here's a convenient simple target we can shoot and kill and feel good
about ourselves that we're finally making progress.  Talking about
straining out the gnat and swallowing the camel.


T

-- 
"I'm not childish; I'm just in touch with the child within!" - RL

Dec 03 2019

mipri <mipri minimaltype.com> writes:

On Wednesday, 4 December 2019 at 01:26:24 UTC, H. S. Teoh wrote:
 And just for a bit more perspective, Python also has heredoc 
 syntax, so does Perl, PHP, bash, and probably many others.

Python actually doesn't have HERE docs. When it's included in
lists of "languages with HERE docs", it's just to show what a
Python programmer would use in their stead.

Please accept Ruby as a replacement example.

Dec 03 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 4 December 2019 at 01:26:24 UTC, H. S. Teoh wrote:
 UTF-16 is  <snip>

VERY useful and helps make D on Windows feel first class, so it 
is easy to do things right.

utf-32 doesn't matter, but "string"w is very, very nice for 
working with Windows, .net, java, etc. easily, efficiently, and 
correctly.


 That DIP seems dead in the water though. The author has 
 vanished and nobody has taken up the reins.

The string interpolation thing is cool, I wrote up my proposal, 
I'm just not likely to bother with the burden of DIP bureaucracy. 
Even javascript has some stuff that beats us now.

 As I said, I don't use a syntax highlighter.  Also, any attempt 
 to highlight is moot if the string contains code of a different 
 language (see below for my use cases).

And I use the heredoc strings BECAUSE of how well they can be 
highlighted - again my vim happens to treat q"html and q"sql and 
q"css and others specially knowing they are embedded.

I could do that with something like css!" " too - a template 
instead and the type information could even be improved but still 
the heredoc is kinda cool for syntax highlighting.




BTW if heredoc strings were to be removed.... tbh I can live with 
it. It bugs me that they must end at the beginning of a line. I 
wish it would let you indent it. Seriously bugs me and is a 
reason why I don't use them more.

but still since they are there i use them.

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Wednesday, 4 December 2019 at 01:26:24 UTC, H. S. Teoh wrote:
 And just for a bit more perspective, Python also has heredoc 
 syntax, so does Perl, PHP, bash, and probably many others. If 
 heredocs were really such a bad idea, why are people putting 
 them into so many languages, over and over again?

To me the opposite seems true. First of all:

 Python does not have here-docs. It does however have 
 triple-quoted strings which can be used similarly.

https://rosettacode.org/wiki/Here_document#Python

Then considering which notable languages have context-sensitive 
string literals:
1987: Perl
1989: Bash
1995: PHP
2001: D
2011: C++11

If you know any other examples, please tell. I don't think 
context-sensitive string literals were ever put in a notable 
language created after 2001. (C++ has the most recent addition, 
but parsing that is already so complex they have nothing to lose)

 That DIP seems dead in the water though. The author has 
 vanished and nobody has taken up the reins.

I was referring to Walter Bright's one:
https://github.com/dlang/DIPs/pull/165

 1) Generating HTML snippets
 2) Generating PovRay scene description snippets
 3) Generating D code snippets
 4) Generating snippets of a DSL I use for generating geometric 
 models
 5) Generating boilerplate for input data to an external convex 
 hull
    solver (has its own peculiar syntax)
 6) Generating GLSL shader code snippets
 7) Generating Java code snippets
 8) Command line usage descriptions

I do believe for most of these you can use ``, q{} and q"<>" with 
little problems, but I understand that you prefer the q"EOS EOS" 
ones and would not want to rewrite your old code.

 ?!  Can't you just use a custom lexer with your PEG grammar?

 Then isn't the solution simply to write a self-contained 
 heredoc parsing function, put it in a dub package, and let 
 everyone reuse it? Then nobody will have to write it for 
 themselves again. Problem solved.

Of course you can make it work. I'm not saying that 
context-sensitive string literals make or break all D lexers, 
it's just a little source of complexity that may not bear its 
weight.

And a good couple of syntax highlighters support multiple 
different languages while being implemented in one, take for 
example this one written in Go:

https://github.com/alecthomas/chroma/blob/master/lexers/d/d.go

I wouldn't expect them to add dub package for D, cargo package 
for Rust, npm package for JavaScript etc.

 This whole debacle feels like heredocs are being singled out as 
 a scapegoat in a misguided quest to "simplify the language".  
 Like we're grasping at straws because we're unable to tackle 
 the bigger issues, so here's a convenient simple target we can 
 shoot and kill and feel good about ourselves that we're finally 
 making progress.  Talking about straining out the gnat and 
 swallowing the camel.

It seems to me D has this history of removing small features with 
a small problem:

- Small feature: escape string literals
   Small problem: doesn't have much use
- Small feature: octal string literals
   Small problem: can be confused for decimal literal, and can be 
made a library feature
- Small feature: hexstring literals
   Small problem: can be better represented in a library function

Now my proposed next one is:

- Small feature: context-sensitive string literals
   Small problem: accidentally bumps the complexity class of D's 
lexical grammar.

Now I understand that reviewers are debating whether it is a 
small feature ("I actually use these a lot") and whether the 
small problem isn't too small ("making D lexers still isn't 
hard"). That's what I like to see in the review, thanks 
especially to WebFreak and Adam D. Ruppe for their input on their 
VSCode and Vim highlighters, and thanks to you for your use cases.

What I don't get is why this is called a "non-starter" by Andrei 
and a "debacle" / "misguided quest" by you. Is it such a 
ludicrous idea to deprecate this particular part of the language?

I admit that I misjudged that amount of use, breakage and 
complexity this feature has before writing this DIP. If this 
trend continues then this DIP is dead, I'm not going to push this 
hard or anything. But I am at least still interested in Walter 
and Atila's opinion.

Dec 04 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 04.12.19 12:10, Dennis wrote:
 
 
 Now my proposed next one is:
 
 - Small feature: context-sensitive string literals
    Small problem: accidentally bumps the complexity class of D's lexical 
 grammar.

A small fix for this small problem is to just say in the specification 
that heredoc identifiers may not exceed 1e100 characters. ;)

Another fix could be to just go over the language specification and 
replace all wrongly applied CS terms by a short explanation of what is 
actually going on. (In practice, when Walter says D's grammar is 
context-free, what he means is that parsing does not depend on semantic 
analysis on a prefix of the code, a property that C++ has which implies 
context-sensitivity and is usually abbreviated this way, and Walter's 
aim was to contrast D to this.)

Dec 04 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 12/4/2019 5:35 AM, Timon Gehr wrote:
 On 04.12.19 12:10, Dennis wrote:
 Now my proposed next one is:

 - Small feature: context-sensitive string literals
    Small problem: accidentally bumps the complexity class of D's lexical
grammar.

 
 A small fix for this small problem is to just say in the specification that 
 heredoc identifiers may not exceed 1e100 characters. ;)
 
 Another fix could be to just go over the language specification and replace
all 
 wrongly applied CS terms by a short explanation of what is actually going on. 

Another case of my lack of academic CS training showing. I would appreciate it 
if qualified people would indeed go through the D spec and correct misuse of
the 
terms.

I know Timon likes to excoriate my conflation of "assert" and "assume", which 
have precise CS definitions. I'm sure there's plenty more in the spec.

 (In practice, when Walter says D's grammar is context-free, what he means is
 that parsing does not depend on semantic analysis on a prefix of the code, a
 property that C++ has which implies context-sensitivity and is usually
 abbreviated this way, and Walter's aim was to contrast D to this.)

That's right. I often express it in even simpler (but less precise) terms - a 
symbol table is not required to parse it. Yes, I know the pedant will point out 
that heredoc has a symbol table with exactly one symbol in it, but please,
allow 
me to concede that in advance and spare us :-)

Dec 04 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 4 December 2019 at 22:57:21 UTC, Walter Bright 
wrote:
 Another case of my lack of academic CS training showing. I 
 would appreciate it if qualified people would indeed go through 
 the D spec and correct misuse of the terms.

I don't think a spec has to use a lot of CS terms, probably 
better to describe it in language that most users can understand.

Like, the other day I got confused by the usage of the term 
"covariant" in
https://dlang.org/spec/function.html

It says stuff like "a pure function … is covariant with an impure 
function", "Nothrow functions are covariant with throwing ones.", 
"Safe functions are covariant with trusted or system functions." 
and "System functions are not covariant with trusted or safe 
functions."

This doesn't tell me anything even if I happened to remember what 
the term means. My understanding is that covariant means that if 
T(A) is related to T'(A') then T<:T' and A<:A', wheras covariant 
means that one of the subtyping relations point the other way.

I cannot fix it either, since I don't know what was meant...

Dec 04 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 4 December 2019 at 23:35:09 UTC, Ola Fosheim 
Grøstad wrote:
 Like, the other day I got confused by the usage of the term 
 "covariant" in

In that context, if you replace "covariant with" with "can act as 
a substitute for" it would work pretty well.

Dec 04 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 4 December 2019 at 23:52:54 UTC, Adam D. Ruppe 
wrote:
 On Wednesday, 4 December 2019 at 23:35:09 UTC, Ola Fosheim 
 Grøstad wrote:
 Like, the other day I got confused by the usage of the term 
 "covariant" in

 In that context, if you replace "covariant with" with "can act 
 as a substitute for" it would work pretty well.

That is much easier to understand, for sure. I think the best 
parts of the documentation is where examples are provided.

Dec 04 2019

mipri <mipri minimaltype.com> writes:

On Wednesday, 4 December 2019 at 11:10:45 UTC, Dennis wrote:
 I do believe for most of these you can use ``, q{} and q"<>"
 with little problems, but I understand that you prefer the
 q"EOS EOS" ones and would not want to rewrite your old code.

The big (and only) advantage of HERE docs is that you so rarely
have to think about them or revise them that this is not a
concern. "Check and see if you've broken the string literal" is
not a step that you go through every single time you have to
touch the content of the string. The most annoying part of HERE
docs for code presentation, that the ending delimiter has such
strict requirements, is precisely what makes them not annoying
at all for holding random snippets of HTML or whatever. You
just don't get collisions, or they are very obvious. The
reader's ease is repeated in the ease that tools have with
them: they don't need a stack; they can just read lines and
throw them away until they find a line that (classically) has
some exact contents, or (in D) starts with some exact prefix.

With only matching nested delimiter strings, accidental
collisions will happen. Not often. But neither " or ])>} are
infrequent characters to find randomly in a string, and the
first time you have to change both ends of a q"( string to make
it a q"[ string because you added a URL that ended in
parentheses to some embedded HTML, you'll think: man, I should
just take all these snippets and stuff them under __EOF__ ,
then read that statically, and stuff them into a map on module
load.

And *then* you'll think: wait, people hardly ever use __EOF__
in D, so someone's definitely going to come along and deprecate
*that* code, too!

The world isn't divided only between good practices and bad
practices. Across from the Scylla of legacy-code-is-sacred
languages that never remove anything, even obviously bad
features that nobody likes (' as a module separator in Perl, or
octal literals that start with 0), there's a Charybdis of
code-is-always-bitrotting languages that jerk you around with
pointless deprecations.

 It seems to me D has this history of removing small features
 with a small problem:

 - Small feature: escape string literals
   Small problem: doesn't have much use

I was surprised when \e didn't work. So it was removed for such
a reason.

 - Small feature: octal string literals
   Small problem: can be confused for decimal literal, and can
 be made a library feature

This is a significant problem actually. The *only* reason
languages have C-style octal literals is because they can't
remove them anymore. It's not "octal literals" in general that

0o123 are octal literals that don't get confused with a nice
decimal number like 0123.

 - Small feature: hexstring literals
   Small problem: can be better represented in a library function

What these removals all have in common is that the post-removal
experience is: you reach for the removed feature, you get an
error, you find out what to do instead, and then there are no
more problems for you. Yes, you're still moving towards
Charybdis with stuff like this, but the point of the myth isn't
"all movements in the direction of Charybdis are bad.", as
those movements are still movements *away* from Scylla.

Removing HERE docs, though, makes the language permanently
more annoying to use for the task that would've benefited from
them. To the point that, rather than just use the intended
replacement, people might rather do something else entirely.
Someone might personally not like the look of \033 vs. \e, or
octal!123 vs. 0123, but the replacement doesn't make them work
any harder.

It's not a huge problem, but it's a difference between this
small deprecation and the previous ones.

 Now I understand that reviewers are debating whether it is a
 small feature ("I actually use these a lot") and whether the
 small problem isn't too small ("making D lexers still isn't
 hard"). That's what I like to see in the review, thanks
 especially to WebFreak and Adam D. Ruppe for their input on
 their VSCode and Vim highlighters, and thanks to you for your
 use cases.

 What I don't get is why this is called a "non-starter" by
 Andrei and a "debacle" / "misguided quest" by you. Is it such a
 ludicrous idea to deprecate this particular part of the
 language?

1. It's *because* the proposed change isn't that bad that it's
getting the responses it's getting, rather than complaints that
the proposed change is very bad and that HERE docs are
irreplaceable treasures. It wasn't until my post just now that
anyone took the time to say that HERE docs have any unique
advantages at all.

2. Andrei's response isn't just "non-starter" but also "HERE
documents have no impact on the language grammar."

Dec 04 2019

Kagamin <spam here.lot> writes:

On Tuesday, 3 December 2019 at 12:38:29 UTC, Andrei Alexandrescu 
wrote:
 This DIP is a non-starter. Here documents are easily and 
 effectively handled during lexing and have no impact on the 
 language grammar.

In a compiler.

Here's an implementation for bash heredoc strings, say something 
nice about it:

---
class HereDocCls {	// Class to manage HERE document elements
public:
	int State;		// 0: '<<' encountered
	// 1: collect the delimiter
	// 2: here doc text (lines after the delimiter)
	int Quote;		// the char after '<<'
	bool Quoted;		// true if Quote in ('\'','"','`')
	bool Indent;		// indented delimiter (for <<-)
	int DelimiterLength;	// strlen(Delimiter)
	char *Delimiter;	// the Delimiter, 256: sizeof PL_tokenbuf
	HereDocCls() {
		State = 0;
		Quote = 0;
		Quoted = false;
		Indent = 0;
		DelimiterLength = 0;
		Delimiter = new char[HERE_DELIM_MAX];
		Delimiter[0] = '\0';
	}
	void Append(int ch) {
		Delimiter[DelimiterLength++] = static_cast<char>(ch);
		Delimiter[DelimiterLength] = '\0';
	}
	~HereDocCls() {
		delete []Delimiter;
	}
};
HereDocCls HereDoc;
---

Dec 04 2019

Guillaume Piolat <first.last gmail.com> writes:

On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":

 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md


YES

I'm generally in favor of removing things. I think I've used 
token strings at times, but not the other variety discussed in 
the DIP and that I didn't know of.

It will break some DUB package, and that's OK since we have 
SemVer.

Dec 03 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, December 3, 2019 2:03:44 AM MST Mike Parker via Digitalmars-d 
wrote:
 This is the feedback thread for the first round of Community
 Review for DIP 1026, "Deprecate Context-Sensitive String
 Literals":

 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d
 1/DIPs/DIP1026.md

 All review-related feedback on and discussion of the DIP should
 occur in this thread. The review period will end at 11:59 PM ET
 on December 17, or when I make a post declaring it complete.

 At the end of Round 1, if further review is deemed necessary, the
 DIP will be scheduled for another round of Community Review.
 Otherwise, it will be queued for the Final Review and Formal
 Assessment.

 Anyone intending to post feedback in this thread is expected to
 be familiar with the reviewer guidelines:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 *Please stay on topic!*

 Thanks in advance to all who participate.

There are definitely people who use token strings in their code when writing
string mixins, because it makes it so that the code in the strings actually
gets syntax highlighting like normal code does instead of being displayed as
a string. I expect that a number of people would be quite unhappy to not be
able to do that anymore.

Personally, I never use token strings, and I'm not sure that I'd even know
about them if I hadn't worked on a D lexer several years ago. I also prefer
that strings look like strings even if they contain code, but I don't care
enough about that to try to get the feature removed, and I'm not sure that I
care much whether the DIP is accepted or not. However, there's no question
that some people think that they're very valuable when writing string
mixins.

- Jonathan M Davis

Dec 03 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis 
wrote:
 There are definitely people who use token strings in their code 
 when writing string mixins

Token strings are q{ }, this is about the delimited strings like 
q"xxx .... xxx" and q"( lll )";

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis 
wrote:
 There are definitely people who use token strings in their code 
 when writing string mixins, because it makes it so that the 
 code in the strings actually gets syntax highlighting like 
 normal code does instead of being displayed as a string.

I don't propose deprecating token strings, only the identifier 
delimited ones, which get highlighted as strings.

```
string s = q{
this is fine
};

string t = q"EOS
this is not fine
EOS";
```

Dec 03 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, December 3, 2019 8:09:19 AM MST Dennis via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis

 wrote:
 There are definitely people who use token strings in their code
 when writing string mixins, because it makes it so that the
 code in the strings actually gets syntax highlighting like
 normal code does instead of being displayed as a string.

 I don't propose deprecating token strings, only the identifier
 delimited ones, which get highlighted as strings.

 ```
 string s = q{
 this is fine
 };

 string t = q"EOS
 this is not fine
 EOS";
 ```

Ah. Clearly, I glanced over it all too quickly. I confess that that
particular type of string literal seems useless to me. I don't think that
I've ever seen anyone use them, and I'd be even less interested in using
them than token strings. I don't feel particularly strongly about whether we
remove them from the language, but if we were talking about adding them, I'd
certainly be against it.

- Jonathan M Davis

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 03:09:19PM +0000, Dennis via Digitalmars-d wrote:
[...]
 I don't propose deprecating token strings, only the identifier
 delimited ones, which get highlighted as strings.
 
 ```
 string s = q{
 this is fine
 };
 
 string t = q"EOS
 this is not fine
 EOS";
 ```

The problem is that token strings require the contents to be *D tokens*.
So if I need to emit snippets of another language, I'm out of luck, and
have to resort to quoted strings and Leaning Toothpick Syndrome.

I oppose this DIP.

1) It puts undue focus on a marginal, non-intrusive language feature and
   makes it seem as if it's a primary cause of tooling problems (it does
   add some complexity, no doubt, but let's not make mountains out of
   molehills here);

2) It places the blame of the syntax highlighting issue at the wrong
   place: syntax highlighters should be fixed, not the other way round.

3) It does not adequately strive to understand why heredoc syntax was
   introduced in the first place, where/when it might be useful, and how
   to mitigate the problems heredoc syntax solves if we were to remove
   it;

4) It breaks a pretty long list of existing D projects, yet does not
   provide strong enough benefits to justify this breakage (doubly so
   for me, because I don't use syntax highlighters to begin with, so for
   me this is all loss and no gain);

5) The breakage does not unquestionably improve code, in fact, I can
   already see many cases for which it makes code *less* readable;

6) The amount of work it will take to rewrite heredoc literals far
   outweighs any small benefits this DIP might bring (and in my case,
   it's work for *no* benefit).


T

-- 
Claiming that your operating system is the best in the world because more
people use it is like saying McDonalds makes the best food in the world. --
Carl B. Constantine

Dec 03 2019

Elronnd <elronnd elronnd.net> writes:

On Tuesday, 3 December 2019 at 21:20:57 UTC, H. S. Teoh wrote:
 The problem is that token strings require the contents to be *D 
 tokens*. So if I need to emit snippets of another language, I'm 
 out of luck, and have to resort to quoted strings and Leaning 
 Toothpick Syndrome.

Bracket-delimited string (q"[text]", allowing <>, [], (), and {} 
as delimiters) are still allowed and do not need to contain valid 
tokens.

Dec 03 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Dec 03, 2019 at 09:35:42PM +0000, Elronnd via Digitalmars-d wrote:
 On Tuesday, 3 December 2019 at 21:20:57 UTC, H. S. Teoh wrote:
 The problem is that token strings require the contents to be *D
 tokens*.  So if I need to emit snippets of another language, I'm out
 of luck, and have to resort to quoted strings and Leaning Toothpick
 Syndrome.

 
 Bracket-delimited string (q"[text]", allowing <>, [], (), and {} as
 delimiters) are still allowed and do not need to contain valid tokens.

They still need to nest properly, though.  Generating BF snippets, for
example, wouldn't work.


T

-- 
English has the lovely word "defenestrate", meaning "to execute by throwing
someone out a window", or more recently "to remove Windows from a computer and
replace it with something useful". :-) -- John Cowan

Dec 03 2019

Kagamin <spam here.lot> writes:

On Tuesday, 3 December 2019 at 21:20:57 UTC, H. S. Teoh wrote:
 2) It places the blame of the syntax highlighting issue at the 
 wrong
    place: syntax highlighters should be fixed, not the other 
 way round.

It requires efficient memory management. Wait, it requires memory 
management? Also the usual tradeoff between space, complexity and 
time, maybe hashtable and CSPRNG. Usually delimited strings are 
simply not implemented as the only reasonable option, but then 
people here say that such highlighter "doesn't support D". So, 
it's not really a problem for highlighter, delimited strings 
simply don't exist there, and can opt in by choosing a different 
highlighter.

Dec 04 2019

Les De Ridder <les lesderid.net> writes:

On Tuesday, 3 December 2019 at 15:05:14 UTC, Jonathan M Davis 
wrote:
 [...]

 There are definitely people who use token strings in their code 
 when writing string mixins, because it makes it so that the 
 code in the strings actually gets syntax highlighting like 
 normal code does instead of being displayed as a string. I 
 expect that a number of people would be quite unhappy to not be 
 able to do that anymore.

This DIP explicitly doesn't deprecate token strings, only
identifier-delimited strings and character-delimited strings.

Dec 03 2019

aliak <something something.com> writes:

On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":

 https://github.com/dlang/DIPs/blob/a7199bcec2ca39b74739b165fc7b97afff9e29d1/DIPs/DIP1026.md

 All review-related feedback on and discussion of the DIP should 
 occur in this thread. The review period will end at 11:59 PM ET 
 on December 17, or when I make a post declaring it complete.

 At the end of Round 1, if further review is deemed necessary, 
 the DIP will be scheduled for another round of Community 
 Review. Otherwise, it will be queued for the Final Review and 
 Formal Assessment.

 Anyone intending to post feedback in this thread is expected to 
 be familiar with the reviewer guidelines:

 https://github.com/dlang/DIPs/blob/master/docs/guidelines-reviewers.md

 *Please stay on topic!*

 Thanks in advance to all who participate.

1) Are there any examples of strings that don't have an in-source 
code workaround if this dip is accepted?

2) the link in rosetta code shows a lot of the languages with 
funky parsing. So I'm not sure that proves anything.

3) how much less complex does the parser actually get? Is it 
trivial?

Dec 03 2019

Dennis <dkorpel gmail.com> writes:

On Tuesday, 3 December 2019 at 23:13:16 UTC, aliak wrote:
 1) Are there any examples of strings that don't have an 
 in-source code workaround if this dip is accepted?

Considering escape sequences such as "\x0B" and string 
concatenation with ~, any string literal can still be expressed. 
The generic, most non-intrusive transformation I can think of 
would be:
Given an identifier delimited string, check which of < ( { [ has 
the least amount of mismatched brackets. Then convert the string 
literal to a bracket delimited string with all unmatched brackets 
concatenated in:
```
q"EOS
((["`[<< { ((["`[<<
EOS"

// only one mismatching {, so it becomes

q"{((["`[<< }" ~ "{" ~ q"{ ((["`[<<}"
```

(This is a worst case example, in practice I expect there to be 
not so many mismatched brackets and quotes/back ticks in a string 
literal)

 3) how much less complex does the parser actually get? Is it 
 trivial?

In dmd not so much, it would just make this function a bit 
smaller:
https://github.com/dlang/dmd/blob/073b6861b1d1a9859a90e25c8d7f079b54280aca/src/dmd/lexer.d#L1477

For implementations of a D lexer in lexer/parser generators (e.g. 
http://dinosaur.compilertools.net/lex/index.html), it means only 
needing context-free constructs to express everything.

Dec 03 2019

Arun Chandrasekaran <aruncxy gmail.com> writes:

On Tuesday, 3 December 2019 at 09:03:44 UTC, Mike Parker wrote:
 This is the feedback thread for the first round of Community 
 Review for DIP 1026, "Deprecate Context-Sensitive String 
 Literals":

 [...]

We use this feature. We can fix the code, but the DIP doesn't 
state a convincing reason to remove this from the language.

Dec 03 2019

Walter Bright <newshound2 digitalmars.com> writes:

There are a lot of DIPs in the pipeline, and this looks highly unlikely to get 
traction, based on the comments. I suggest withdrawing it.

Dec 04 2019

D Programming

C/C++ Programming

Other

digitalmars.D - DIP 1026---Deprecate Context-Sensitive String Literals---Community