www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 3827] New: automatic joining of adjacent strings is bad

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827

           Summary: automatic joining of adjacent strings is bad
           Product: D
           Version: 2.040
          Platform: All
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc


--- Comment #0 from bearophile_hugs eml.cc 2010-02-18 12:40:31 PST ---
import std.stdio;
void main() {
    string[] a = ["foo", "bar" "baz", "spam"];
    writeln(a);
}

This code prints:
foo barbaz spam

But probably the programmer meant to create an array with 4 strings.
D has the ~ concat operator, so to prevent possible programming bugs it's
better to remove the implicit concat of strings separated by whitespace.

Everywhere the programmer wants to concat strings the explicit concat operator
can be used:

string s = "this is a very long string that doesn't fit in" ~
           " a line";

The "Python Zen" has a rule that says:

Explicit is better than implicit.

The compiler can optimize the concat away at compile time.

C code ported to D that doesn't put a ~ just raises a compile time error that's
easy to understand and fix.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 18 2010
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #1 from Alexey Ivanov <aifgi90 gmail.com> 2010-02-18 14:35:32 PST
---
Created an attachment (id=571)
patch for parse.c

Vote++ and patch

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 18 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #2 from bearophile_hugs eml.cc 2010-02-18 14:55:40 PST ---
(In reply to comment #1)
 Created an attachment (id=571) [details]
 patch for parse.c
 
 Vote++ and patch
Thank you. But is DMD doing the joining with ~ at compile time? If not, then you can add that optimization to your patch (if you are able to). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #3 from bearophile_hugs eml.cc 2010-02-18 15:03:33 PST ---
 Thank you. But is DMD doing the joining with ~ at compile time? If not, then
 you can add that optimization to your patch (if you are able to).
And if you think it's needed, you can add the clear error message I was talking about :-) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 18 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Alexey Ivanov <aifgi90 gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aifgi90 gmail.com


--- Comment #4 from Alexey Ivanov <aifgi90 gmail.com> 2010-02-28 09:36:58 PST
---
 Thank you. But is DMD doing the joining with ~ at compile time? If not, then
 you can add that optimization to your patch (if you are able to).
I think DMD is doing joining at compile time (constfold.c, from line 1387) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 28 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #5 from bearophile_hugs eml.cc 2010-06-20 16:19:15 PDT ---
The error message for the missing ~ can be something like this (adapted from
the "'l' suffix is deprecated, use 'L' instead" error message generated by the
usage of a 10l long literal):

adjacent string literals concatenation is deprecated, add ~ between them
instead.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 20 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Ellery Newcomer <ellery-newcomer utulsa.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ellery-newcomer utulsa.edu


--- Comment #6 from Ellery Newcomer <ellery-newcomer utulsa.edu> 2010-06-20
16:29:07 PDT ---
(In reply to comment #0)
 The "Python Zen" has a rule that says:
 
 Explicit is better than implicit.
 
the python compiler has a rule that says do the exact same thing as what d is doing. Your serve. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #7 from bearophile_hugs eml.cc 2010-06-20 16:51:06 PDT ---
I know Python, but I hope D will become better than Python on this syntax
detail.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 20 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #8 from bearophile_hugs eml.cc 2010-08-21 13:38:59 PDT ---
A particularly nice example of why untidy syntax easily leads to bugs (this
comes from two different sources of sloppiness of the D2 language):


enum string[5] data = ["green", "magenta", "blue" "red", "yellow"];
static assert(data[4] == "yellow"); // asserts
void main() {}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 21 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #9 from bearophile_hugs eml.cc 2010-11-10 18:17:17 PST ---
Another bug caused in my code by that anti-feature:


unittest {
    auto tests = [["", "0000"], ["12346", "0000"], ["he", "H000"],
                  ["soundex", "S532"], ["example", "E251"],
                  ["ciondecks", "C532"], ["ekzampul", "E251"],
                  ["resume", "R250"], ["Robert", "R163"],
                  ["Rupert", "R163"], ["Rubin" "R150"],
                  ["Ashcraft", "A226"], ["Ashcroft", "A226"]];
    foreach (pair; tests)
        assert(processit(pair[0]) == pair[1]);
}


That code compiles with no errors with DMD 2.050, and then causes a Range
violation at runtime because one of those arrays isn't a pair.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 10 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #10 from bearophile_hugs eml.cc 2010-11-10 18:21:06 PST ---
The C# language, that has a very refined design, refuses this code, showing
that it doesn't perform automatic joining of adjacent strings:


public class Test {
    public static void Main() {
        string s = "hello " "world";
    }
}


prog.cs(3,35): error CS1525: Unexpected symbol `world'
Compilation failed: 1 error(s), 0 warnings

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 10 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #11 from bearophile_hugs eml.cc 2010-11-12 04:24:57 PST ---
Walter agrees:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=121830

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 12 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #12 from bearophile_hugs eml.cc 2010-11-12 04:32:16 PST ---
A comment from Andrei Alexandrescu:
Walter, please don't forget to tweak the associativity rules: var ~ " literal "
~ " literal " concatenates literals first.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 12 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #13 from bearophile_hugs eml.cc 2010-11-13 19:19:03 PST ---
(In reply to comment #12)

A comment from Stewart Gordon:

 You mean make ~ right-associative?  I think this'll break more code than
 it fixes.

 But implementing a compiler optimisation so that var ~ ctc ~ ctc is
 processed as var ~ (ctc ~ ctc), _in those cases where they're
 equivalent_, would be sensible.

 ctc = compile-time constant
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #14 from Ellery Newcomer <ellery-newcomer utulsa.edu> 2010-11-13
19:26:18 PST ---
you don't need to mess with associativity rules, you just need to be able to
handle two or three ast cases:

1. (~ str str)        ie  str ~ str
2. (~ (~ x str) str)  ie  x ~ str ~ str
3. (~ str (~ str x))  ie  str ~ (str ~ x)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 13 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug yahoo.com.au


--- Comment #15 from Don <clugdbug yahoo.com.au> 2010-11-13 23:51:58 PST ---
(In reply to comment #14)
 you don't need to mess with associativity rules, you just need to be able to
 handle two or three ast cases:
 
 1. (~ str str)        ie  str ~ str
 2. (~ (~ x str) str)  ie  x ~ str ~ str
 3. (~ str (~ str x))  ie  str ~ (str ~ x)
Like this (optimize.c, line 1023): Expression *CatExp::optimize(int result) { Expression *e; //printf("CatExp::optimize(%d) %s\n", result, toChars()); e1 = e1->optimize(result); e2 = e2->optimize(result); + if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull) + && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op == TOKnull)) + { + // Convert (e ~ str) ~ str into e ~ (str ~ str) + CatExp *ce = ((CatExp *)e1); + e1 = ce->e1; + ce->e1 = ce->e2; + ce->e2 = e2; + e2 = ce; + } e = Cat(type, e1, e2); if (e == EXP_CANT_INTERPRET) e = this; return e; } -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 13 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #16 from Don <clugdbug yahoo.com.au> 2010-11-13 23:58:35 PST ---
Sorry, missed out a line:

    if (e1->op == TOKcat && (e2->op == TOKstring || e2->op == TOKnull)
            && (((CatExp *)e1)->e2->op == TOKstring || ((CatExp *)e1)->e2->op
== TOKnull))
    {
        // Convert  (e ~ str) ~ str into  e ~ (str ~ str)
        CatExp *ce = ((CatExp *)e1);
        e1 = ce->e1;
        ce->e1 = ce->e2;
        ce->e2 = e2;
        e2 = ce;
+        e2 = e2->optimize(result);
    }

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 13 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Stewart Gordon <smjg iname.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com


--- Comment #17 from Stewart Gordon <smjg iname.com> 2010-11-16 17:09:41 PST ---
(In reply to comment #5)
 The error message for the missing ~ can be something like this (adapted from
 the "'l' suffix is deprecated, use 'L' instead" error message generated by the
 usage of a 10l long literal):
 
 adjacent string literals concatenation is deprecated, add ~ between them
 instead.
Better watch out for cases where just adding ~ changes the behaviour. For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings. Not that there's any real use case for "this" "that" anyway. And those rare use cases it does have in D can be fixed by inserting the ~, though there may be easier-to-miss cases of the above of which to be wary. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #18 from Stewart Gordon <smjg iname.com> 2010-11-16 17:15:03 PST ---
(In reply to comment #17)
 For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that"
 evaluate to different strings.
Different string arrays even. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


nfxjfg gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nfxjfg gmail.com


--- Comment #19 from nfxjfg gmail.com 2010-11-16 19:01:06 PST ---
(In reply to comment #17)
 Not that there's any real use case for "this" "that" anyway.  And those rare
 use cases
I use automatic joining all the time for long string literals. I want them to span multiple source lines without containing line breaks. No, not a rarely used feature. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #20 from bearophile_hugs eml.cc 2010-11-16 19:38:56 PST ---
(In reply to comment #19)
 (In reply to comment #17)
 Not that there's any real use case for "this" "that" anyway.  And those rare
 use cases
I use automatic joining all the time for long string literals. I want them to span multiple source lines without containing line breaks. No, not a rarely used feature.
Stewart Gordon was just talking about code like: a ~ "this" "that" where a is a string[]. To join multiple lines you may add a ~ at their end: string text = "I use automatic joining all the time for long string literals. I want them to " ~ "span multiple source lines without containing line breaks. " ~ "No, not a rarely used feature."; -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com


--- Comment #21 from Steven Schveighoffer <schveiguy yahoo.com> 2010-11-16
21:33:05 PST ---
(In reply to comment #17)
 (In reply to comment #5)
 The error message for the missing ~ can be something like this (adapted from
 the "'l' suffix is deprecated, use 'L' instead" error message generated by the
 usage of a 10l long literal):
 
 adjacent string literals concatenation is deprecated, add ~ between them
 instead.
Better watch out for cases where just adding ~ changes the behaviour. For example, if a is a string[], then a ~ "this" "that" and a ~ "this" ~ "that" evaluate to different strings.
doesn't this solve that problem? a ~ ("this" ~ "that") BTW, I don't expect very many cases like this (in fact, I bet there are none). -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 16 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #22 from Stewart Gordon <smjg iname.com> 2010-11-17 03:58:08 PST ---
(In reply to comment #21)
 doesn't this solve that problem? a ~ ("this" ~ "that")
It does. My point was that somebody might accidentally not add the brackets. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 17 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #23 from Sobirari Muhomori <dfj1esp02 sneakemail.com> 2010-11-17
12:04:03 PST ---
If constfold can access a's type, it can make the right decision.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Nov 17 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #24 from bearophile_hugs eml.cc 2010-11-22 12:01:40 PST ---
A recent note by Walter:

 Andrei's right. This is not about making it right-associative. It is about
 defining in the language that:
 
     ((a ~ b) ~ c)
 
 is guaranteed to produce the same result as:
 
     (a ~ (b ~ c))
 
 Unfortunately, the language cannot make such a guarantee in the face of
operator
 overloading. But it can do it for cases where operator overloading is not in
play.
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 22 2010
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #25 from bearophile_hugs eml.cc 2011-03-20 04:29:08 PDT ---
See also:

http://stackoverflow.com/questions/2504536/why-allow-concatenation-of-string-literals

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 20 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827



--- Comment #26 from bearophile_hugs eml.cc 2012-03-10 17:31:34 PST ---
An example of the problems this
avoids:http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.announce&article_id=22649

Andrej Mitrovic:

 I see you are not the only one who started writing string array
 literals like this:
 
 enum PEGCode = grammarCode!(
      "Grammar <- S Definition+ EOI"
     ,"Definition <- RuleName Arrow Expression"
     ,"RuleName   <- Identifier>(ParamList?)"
     ,"Expression <- Sequence (OR Sequence)*"
 );
 
 IOW comma on the left side. I know it's not a style preference but
 actually a (unfortunate but needed) technique for avoiding bugs. :)
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3827


Andrej Mitrovic <andrej.mitrovich gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrej.mitrovich gmail.com


--- Comment #27 from Andrej Mitrovic <andrej.mitrovich gmail.com> 2012-03-10
17:56:16 PST ---
(In reply to comment #26)
 enum PEGCode = grammarCode!(
      "Grammar <- S Definition+ EOI"
     ,"Definition <- RuleName Arrow Expression"
     ,"RuleName   <- Identifier>(ParamList?)"
     ,"Expression <- Sequence (OR Sequence)*"
 );
Note that this is Philippe Sigaud's code. So you can him, and me to the list of people affected by this. I'm doing string processing in D on a day-to-day basis, and whenever I have a list of strings I eventually end up shooting myself in the foot because of a missing comma. It's very easy (at least for clumsy me) to make the mistake. E.g. writing some headers to ignore: string[] ignoredHeaders = [ "foo.bar" // todo: have to fix this later "foo.do", // todo: later ]; When I have comments next to the strings it makes it easy to miss the missing comma, especially if the strings are of a different length. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2012