www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is implicit string literal concatenation a good thing?

reply Frank Benoit <keinfarbton googlemail.com> writes:
Find the bug:
    static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
        "assert", "auto", "body", "bool", "break", "byte", "case",
        "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
        "const", "continue", "creal", "dchar", "debug", "default",
        "delegate", "delete", "deprecated", "do", "double", "else",
        "enum", "export", "extern", "false", "final", "finally",
        "float", "for", "foreach", "foreach_reverse", "function",
        "goto", "idouble", "if", "ifloat", "import", "in", "inout",
        "int", "interface", "invariant", "ireal", "is", "lazy", "long",
        "mixin", "module", "new", "null", "out", "override", "package",
        "pragma", "private", "private:", "protected", "protected:",
        "public", "public:", "real", "return", "scope", "short",
        "static", "struct", "super", "switch", "synchronized",
        "template", "this", "throw", "true", "try", "typedef", "typeid",
        "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
        "ushort", "version", "void", "volatile", "wchar", "while",
        "with", "~this" ];

There is a comma missing : "uint" "ulong"
Feb 22 2009
next sibling parent reply Brad Roberts <braddr puremagic.com> writes:
Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
         "const", "continue", "creal", "dchar", "debug", "default",
         "delegate", "delete", "deprecated", "do", "double", "else",
         "enum", "export", "extern", "false", "final", "finally",
         "float", "for", "foreach", "foreach_reverse", "function",
         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
         "mixin", "module", "new", "null", "out", "override", "package",
         "pragma", "private", "private:", "protected", "protected:",
         "public", "public:", "real", "return", "scope", "short",
         "static", "struct", "super", "switch", "synchronized",
         "template", "this", "throw", "true", "try", "typedef", "typeid",
         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
         "ushort", "version", "void", "volatile", "wchar", "while",
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

I have a personal style rule that says: if a list like that (be it function parameters, initializers, whatever) is more than one line, it's one element per line. I hate having to visually parse things, or play the re-wrap game as the lists change. I hadn't really thought about, until now, the side benefit of making it easier to spot missing trailing commas. Back in c and c++, with it's pre-processor, merging adjacent string literals is very handy. In D, it's only marginally so, but not completely useless. It can still be used to break a really long string literal into parts. There's other string boundary tokens in D which might well provide viable alternatives. Just my two cents, Brad
Feb 22 2009
parent reply Christopher Wright <dhasenan gmail.com> writes:
Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals. In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.
Feb 22 2009
parent Don <nospam nospam.com> writes:
Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com> wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals. In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.


I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to. But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time. (I think the is the case now, right?)

Yes, and because of CTFE, even complicated applications of ~ frequently don't involve any allocation. So your intuition was wrong! Implicit concatentation was probably one of the things which led to your false impression. So it may be bad in that respect, as well as bug-breeding.
 
 --bb

Feb 22 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Frank Benoit Wrote:

     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
 ...
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

In such situations I often let the language split my string for me, it reduces noise: auto keywords = "abstract alias align asm assert auto body bool break byte case cast catch cdouble cent cfloat char class const continue creal dchar debug default delegate delete deprecated do double else enum export extern false final finally float for foreach foreach_reverse function goto idouble if ifloat import in inout int interface invariant ireal is lazy long mixin module new null out override package pragma private private: protected protected: public public: real return scope short static struct super switch synchronized template this throw true try typedef typeid typeof ubyte ucent uint ulong union unittest ushort version void volatile wchar while with ~this".split(); You can also put one keyword for each line, or put them in better formatted columns. If the strings may have spaces too inside then, then I put each string in a different line, and then split according to the lines with std.string.splitlines() (or str.splitlines() in Python). Implicit string literal concatenation is a bug-prone anti-feature that is a relic of C language that doesn't have a nice string concatenation syntax. In D (and Python, etc) it's bad. Months ago I have suggested to remove it and turn adjacent string literals into a syntax error (to "solve" the back-compatibility with ported C/C++ code). Brad Roberts:
In D, it's only marginally so, but not completely useless.  It can still be
used to break a really long string literal into parts.<

In such situations you can put a ~ at the end of each part. Explicit is better than implicit :-) Bye, bearophile
Feb 22 2009
parent Kagamin <spam here.lot> writes:
bearophile Wrote:

 Months ago I have suggested to remove it

Didn't find it in bugzilla. Reported as bug 2685.
Feb 24 2009
prev sibling next sibling parent Sergey Gromov <snake.scaly gmail.com> writes:
Sun, 22 Feb 2009 10:21:20 +0100, Frank Benoit wrote:

 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
         "const", "continue", "creal", "dchar", "debug", "default",
         "delegate", "delete", "deprecated", "do", "double", "else",
         "enum", "export", "extern", "false", "final", "finally",
         "float", "for", "foreach", "foreach_reverse", "function",
         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
         "mixin", "module", "new", "null", "out", "override", "package",
         "pragma", "private", "private:", "protected", "protected:",
         "public", "public:", "real", "return", "scope", "short",
         "static", "struct", "super", "switch", "synchronized",
         "template", "this", "throw", "true", "try", "typedef", "typeid",
         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
         "ushort", "version", "void", "volatile", "wchar", "while",
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

I agree this feature is dangerous and useless in D.
Feb 22 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals. In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.

I agree.
Feb 22 2009
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com> wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals. In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.

I agree.

I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to. But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time. (I think the is the case now, right?) --bb
Feb 22 2009
parent BCS <none anon.com> writes:
Hello Bill,
 I use this feature pretty frequently to break up long strings. I think
 I didn't use ~ for that because it makes me think an allocation might
 happen when it doesn't need to.
 

yah, the WILL-happen-at-compiletime bit is nice
 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

same here.
 
 --bb
 

Feb 22 2009
prev sibling next sibling parent reply Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Feb 22, 2009 at 12:51 PM, Bill Baxter <wbaxter gmail.com> wrote:
 I use this feature pretty frequently to break up long strings.
 I think I didn't use ~ for that because it makes me think an
 allocation might happen when it doesn't need to.

 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

Of course, it does it as a matter of constant folding, just like 3 + 4.
Feb 22 2009
next sibling parent reply BCS <none anon.com> writes:
Hello Jarrett,

 On Sun, Feb 22, 2009 at 12:51 PM, Bill Baxter <wbaxter gmail.com>
 wrote:
 
 I use this feature pretty frequently to break up long strings. I
 think I didn't use ~ for that because it makes me think an allocation
 might happen when it doesn't need to.
 
 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)
 

4.

IIRC DMD doesn't always do the constant folding (Decent has a post processed view that shows this in some cases) For instance, IIRC it only does left most so this: char[] foo = "foo"; char[] bar = foo ~ "bar" ~ "baz" doesn't get folded. And even if DMD were to start doing that one, there is no requirement that another compiler also do it.
Feb 22 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
BCS:

 IIRC DMD doesn't always do the constant folding (Decent has a post processed 
 view that shows this in some cases) For instance, IIRC it only does left 
 most so this:
 char[] foo = "foo";
 char[] bar = foo ~ "bar" ~ "baz"
 doesn't get folded. And even if DMD were to start doing that one, there is 
 no requirement that another compiler also do it.

If there are guarantees that "abc" "def" are folded at compile time, then the same guarantees can be specified for "abc" ~ "def". I can't see a problem. I have also compiled this code with DMD: void main() { string foo = "foo"; string bar = foo ~ "bar" ~ "baz"; } Resulting asm, no optimizations: L0: push EBP mov EBP,ESP mov EDX,FLAT:_DATA[0Ch] mov EAX,FLAT:_DATA[08h] push dword ptr FLAT:_DATA[01Ch] push dword ptr FLAT:_DATA[018h] push dword ptr FLAT:_DATA[02Ch] push dword ptr FLAT:_DATA[028h] push EDX push EAX push 3 mov ECX,offset FLAT:_D11TypeInfo_Aa6__initZ push ECX call near ptr __d_arraycatnT xor EAX,EAX add ESP,020h pop EBP ret Resulting asm, with optimizations: L0: sub ESP,0Ch mov EAX,offset FLAT:_D11TypeInfo_Aa6__initZ push dword ptr FLAT:_DATA[01Ch] push dword ptr FLAT:_DATA[018h] push dword ptr FLAT:_DATA[02Ch] push dword ptr FLAT:_DATA[028h] push dword ptr FLAT:_DATA[0Ch] push dword ptr FLAT:_DATA[08h] push 3 push EAX call near ptr __d_arraycatnT add ESP,020h add ESP,0Ch xor EAX,EAX ret I can see just one arraycatn, so the two string literals are folded at compile time, I think. Bye, bearophile
Feb 22 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Denis Koroskin:
 bearophile:
 void main() {
     string foo = "foo";
     string bar = foo ~ "bar" ~ "baz";
 }

Won't work. Imaging foo is a user-defined type with custom opCat: auto bar = foo ~ "123" ~ "456"; compare to: std::cout << "123" << "456";

In this thread I was talking about the concat of true strings, not of generic objects: auto bar = foo ~ ("123" ~ "456"); Are you saying that the concat operation of "123" ~ "456" has a different (invisible) "operator" precedence of: "123" "456" ? If this is true, then the ~ isn't a fully drop-in replacement for the automatic concat of strings as done in C... Bye, bearophile
Feb 22 2009
parent Ellery Newcomer <ellery-newcomer utulsa.edu> writes:
bearophile wrote:
 Denis Koroskin:
 bearophile:
 void main() {
     string foo = "foo";
     string bar = foo ~ "bar" ~ "baz";
 }

auto bar = foo ~ "123" ~ "456"; compare to: std::cout << "123" << "456";

In this thread I was talking about the concat of true strings, not of generic objects: auto bar = foo ~ ("123" ~ "456"); Are you saying that the concat operation of "123" ~ "456" has a different (invisible) "operator" precedence of: "123" "456" ? If this is true, then the ~ isn't a fully drop-in replacement for the automatic concat of strings as done in C... Bye, bearophile

"123" "456" has the higher precedence
Feb 22 2009
prev sibling next sibling parent reply BCS <none anon.com> writes:
Hello bearophile,

 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

While it is not part of the spec, I do see a problem. If it were added....
 
 I have also compiled this code with DMD:
 
 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:
 
 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

note 6 things
 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:
 
 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]

again 6 things
 push	3

I think that is a varargs call
 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.
 
 Bye,
 bearophile

I think that DMD does some optimization for a~b~c etc. so that there is only one call for any number of chained ~ (array cat n). In this case I think it is doing that.
Feb 22 2009
parent Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 23 Feb 2009 03:48:17 +0000 (UTC), BCS wrote:

 Hello bearophile,
 
 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

While it is not part of the spec, I do see a problem. If it were added....
 
 I have also compiled this code with DMD:
 
 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:
 
 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

note 6 things
 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:
 
 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]

again 6 things
 push	3

I think that is a varargs call
 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.
 
 Bye,
 bearophile

I think that DMD does some optimization for a~b~c etc. so that there is only one call for any number of chained ~ (array cat n). In this case I think it is doing that.

Surely enough, if you look into the compiled .obj you won't find "barbaz" there. All sub-strings are separete, regardless of the optimization options.
Feb 26 2009
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 26 Feb 2009 20:59:34 +0300, Sergey Gromov <snake.scaly gmail.com> wrote:

 Mon, 23 Feb 2009 03:48:17 +0000 (UTC), BCS wrote:

 Hello bearophile,

 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

While it is not part of the spec, I do see a problem. If it were added....
 I have also compiled this code with DMD:

 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:

 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

note 6 things
 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:

 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]

again 6 things
 push	3

I think that is a varargs call
 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.

 Bye,
 bearophile

I think that DMD does some optimization for a~b~c etc. so that there is only one call for any number of chained ~ (array cat n). In this case I think it is doing that.

Surely enough, if you look into the compiled .obj you won't find "barbaz" there. All sub-strings are separete, regardless of the optimization options.

Here is a test: import std.stdio; void main() { string t1 = "bar1" ~ "baz1"; string t2 = t1 ~ "bar2" ~ "baz2"; string t3 = t1 ~ ("bar3" ~ "baz3"); writefln(t1); writefln(t2); writefln(t3); } compiled test executable contains strings bar1baz1 and bar3baz3. Forth to note that declaring t1, t2 and t3 as const (i.e. "const string t1" etc) makes the concatenations entirely at compile-time.
Feb 26 2009
prev sibling next sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 23 Feb 2009 04:18:35 +0300, bearophile <bearophileHUGS lycos.com> wrote:

 BCS:

 IIRC DMD doesn't always do the constant folding (Decent has a post  
 processed
 view that shows this in some cases) For instance, IIRC it only does left
 most so this:
 char[] foo = "foo";
 char[] bar = foo ~ "bar" ~ "baz"
 doesn't get folded. And even if DMD were to start doing that one, there  
 is
 no requirement that another compiler also do it.

If there are guarantees that "abc" "def" are folded at compile time, then the same guarantees can be specified for "abc" ~ "def". I can't see a problem. I have also compiled this code with DMD: void main() { string foo = "foo"; string bar = foo ~ "bar" ~ "baz"; }

Won't work. Imaging foo is a user-defined type with custom opCat: auto bar = foo ~ "123" ~ "456"; compare to: std::cout << "123" << "456";
Feb 22 2009
prev sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Sun, Feb 22, 2009 at 9:29 PM, bearophile <bearophileHUGS lycos.com> wrote:

 Are you saying that the concat operation of
 "123" ~  "456"
 has a different (invisible) "operator" precedence of:
 "123" "456" ?
 If this is true, then the ~ isn't a fully drop-in replacement for the
automatic concat of strings as done in C...

Currently that's the case. But it's simply unspecified in the language specification, and there's no reason why the compiler can't turn: a = foo ~ "bar" ~ "baz"; into: a = foo ~ "barbaz"; FWIW the MiniD compiler does this already. It's just that DMD currently does concatenation constant folding in a simple manner that makes this kind of folding "invisible" to it. But when all the operands of the concatenations are strings - like when building up strings to be mixed-in, or in static/const variable initializers - then everything will obviously be folded at compile time.
Feb 22 2009
prev sibling next sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Mon, Feb 23, 2009 at 3:42 AM, Don <nospam nospam.com> wrote:
 Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com>
 wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright
 <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The only way to catenate strings is with strcat. That places the additional burden on programmers that they have to include string.h. For that reason, it makes sense to catenate adjacent string literals. In D, there's a compile time catenation operator that doesn't require libraries. So the catenation by association saves you only one character. I'd say that's useless.

I agree.

I use this feature pretty frequently to break up long strings. I think I didn't use ~ for that because it makes me think an allocation might happen when it doesn't need to. But after seeing the discussion here I'd be happy to switch to using "a"~"b" as long as it's guaranteed by the language that such strings will be concatenated at compile time. (I think the is the case now, right?)

Yes, and because of CTFE, even complicated applications of ~ frequently don't involve any allocation. So your intuition was wrong! Implicit concatentation was probably one of the things which led to your false impression. So it may be bad in that respect, as well as bug-breeding.

Well, like I said, I vaguely recalled that DMD would eliminate the alloc. But is it in the spec? Some other compiler might not implement that optimization. Or I might change from "foo"~"bar" to "foo"~runTimeVar at some point and not notice that I'd introduced an allocation because of that. So the benefit of "foo" "bar" there was that I could be absolutely sure, since it's in the spec, that it concatenates the strings at compile time. But I agree it's something that could be gotten rid of. --bb
Feb 22 2009
prev sibling parent reply Miles <_______ _______.____> writes:
Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",

Out of curiosity, are you trying to create a D parser? Because "private:", "protected:", "public:" and "~this" are not keywords.
Feb 25 2009
parent Frank Benoit <keinfarbton googlemail.com> writes:
Miles schrieb:
 Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",

Out of curiosity, are you trying to create a D parser? Because "private:", "protected:", "public:" and "~this" are not keywords.

Thats just a snippets I got from a dwt user. After pasting that into a Java source file, i got an error for the missing comma, and I found that interesting.
Feb 25 2009