digitalmars.D - Is implicit string literal concatenation a good thing?

Frank Benoit (19/19) Feb 22 2009 Find the bug:

Brad Roberts (14/34) Feb 22 2009 I have a personal style rule that says: if a list like that (be it

Christopher Wright (8/13) Feb 22 2009 In C and C++, there is no way to catenate strings at compile time. The

Denis Koroskin (2/15) Feb 22 2009 I agree.

Bill Baxter (9/28) Feb 22 2009 I use this feature pretty frequently to break up long strings.

Don (5/35) Feb 22 2009 Yes, and because of CTFE, even complicated applications of ~ frequently

Bill Baxter (10/52) Feb 22 2009 Well, like I said, I vaguely recalled that DMD would eliminate the

BCS (3/14) Feb 22 2009 same here.

Jarrett Billingsley (2/9) Feb 22 2009 Of course, it does it as a matter of constant folding, just like 3 + 4.

BCS (8/23) Feb 22 2009 IIRC DMD doesn't always do the constant folding (Decent has a post proce...

bearophile (45/52) Feb 22 2009 If there are guarantees that "abc" "def" are folded at compile time, the...

Denis Koroskin (5/23) Feb 22 2009 Won't work. Imaging foo is a user-defined type with custom opCat:

bearophile (10/21) Feb 22 2009 In this thread I was talking about the concat of true strings, not of ge...

Ellery Newcomer (2/25) Feb 22 2009 "123" "456" has the higher precedence
Jarrett Billingsley (14/19) Feb 22 2009 Currently that's the case. But it's simply unspecified in the

BCS (8/60) Feb 22 2009 note 6 things

Sergey Gromov (4/74) Feb 26 2009 Surely enough, if you look into the compiled .obj you won't find

Denis Koroskin (15/91) Feb 26 2009 Here is a test:

bearophile (27/34) Feb 22 2009 In such situations I often let the language split my string for me, it r...

Kagamin (2/3) Feb 24 2009 Didn't find it in bugzilla. Reported as bug 2685.

Sergey Gromov (2/22) Feb 22 2009 I agree this feature is dangerous and useless in D.
Miles (3/5) Feb 25 2009 Out of curiosity, are you trying to create a D parser? Because

Frank Benoit (4/10) Feb 25 2009 Thats just a snippets I got from a dwt user.

Frank Benoit <keinfarbton googlemail.com> writes:

Find the bug:
    static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
        "assert", "auto", "body", "bool", "break", "byte", "case",
        "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
        "const", "continue", "creal", "dchar", "debug", "default",
        "delegate", "delete", "deprecated", "do", "double", "else",
        "enum", "export", "extern", "false", "final", "finally",
        "float", "for", "foreach", "foreach_reverse", "function",
        "goto", "idouble", "if", "ifloat", "import", "in", "inout",
        "int", "interface", "invariant", "ireal", "is", "lazy", "long",
        "mixin", "module", "new", "null", "out", "override", "package",
        "pragma", "private", "private:", "protected", "protected:",
        "public", "public:", "real", "return", "scope", "short",
        "static", "struct", "super", "switch", "synchronized",
        "template", "this", "throw", "true", "try", "typedef", "typeid",
        "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
        "ushort", "version", "void", "volatile", "wchar", "while",
        "with", "~this" ];

There is a comma missing : "uint" "ulong"

Feb 22 2009

Brad Roberts <braddr puremagic.com> writes:

Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
         "const", "continue", "creal", "dchar", "debug", "default",
         "delegate", "delete", "deprecated", "do", "double", "else",
         "enum", "export", "extern", "false", "final", "finally",
         "float", "for", "foreach", "foreach_reverse", "function",
         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
         "mixin", "module", "new", "null", "out", "override", "package",
         "pragma", "private", "private:", "protected", "protected:",
         "public", "public:", "real", "return", "scope", "short",
         "static", "struct", "super", "switch", "synchronized",
         "template", "this", "throw", "true", "try", "typedef", "typeid",
         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
         "ushort", "version", "void", "volatile", "wchar", "while",
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

I have a personal style rule that says: if a list like that (be it
function parameters, initializers, whatever) is more than one line, it's
one element per line.  I hate having to visually parse things, or play
the re-wrap game as the lists change.  I hadn't really thought about,
until now, the side benefit of making it easier to spot missing trailing
commas.

Back in c and c++, with it's pre-processor, merging adjacent string
literals is very handy.  In D, it's only marginally so, but not
completely useless.  It can still be used to break a really long string
literal into parts.  There's other string boundary tokens in D which
might well provide viable alternatives.

Just my two cents,
Brad

Feb 22 2009

Christopher Wright <dhasenan gmail.com> writes:

Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

In C and C++, there is no way to catenate strings at compile time. The 
only way to catenate strings is with strcat. That places the additional 
burden on programmers that they have to include string.h. For that 
reason, it makes sense to catenate adjacent string literals.

In D, there's a compile time catenation operator that doesn't require 
libraries. So the catenation by association saves you only one 
character. I'd say that's useless.

Feb 22 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

 In C and C++, there is no way to catenate strings at compile time. The  
 only way to catenate strings is with strcat. That places the additional  
 burden on programmers that they have to include string.h. For that  
 reason, it makes sense to catenate adjacent string literals.

 In D, there's a compile time catenation operator that doesn't require  
 libraries. So the catenation by association saves you only one  
 character. I'd say that's useless.

I agree.

Feb 22 2009

Bill Baxter <wbaxter gmail.com> writes:

On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com> wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

 In C and C++, there is no way to catenate strings at compile time. The
 only way to catenate strings is with strcat. That places the additional
 burden on programmers that they have to include string.h. For that reason,
 it makes sense to catenate adjacent string literals.

 In D, there's a compile time catenation operator that doesn't require
 libraries. So the catenation by association saves you only one character.
 I'd say that's useless.

 I agree.

I use this feature pretty frequently to break up long strings.
I think I didn't use ~ for that because it makes me think an
allocation might happen when it doesn't need to.

But after seeing the discussion here I'd be happy to switch to using
"a"~"b" as long as it's guaranteed by the language that such strings
will be concatenated at compile time.   (I think the is the case now,
right?)

--bb

Feb 22 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com> wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

 In C and C++, there is no way to catenate strings at compile time. The
 only way to catenate strings is with strcat. That places the additional
 burden on programmers that they have to include string.h. For that reason,
 it makes sense to catenate adjacent string literals.

 In D, there's a compile time catenation operator that doesn't require
 libraries. So the catenation by association saves you only one character.
 I'd say that's useless.

 I agree.

 
 I use this feature pretty frequently to break up long strings.
 I think I didn't use ~ for that because it makes me think an
 allocation might happen when it doesn't need to.
 
 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

Yes, and because of CTFE, even complicated applications of ~ frequently 
don't involve any allocation. So your intuition was wrong! Implicit 
concatentation was probably one of the things which led to your false 
impression. So it may be bad in that respect, as well as bug-breeding.



 
 --bb

Feb 22 2009

Bill Baxter <wbaxter gmail.com> writes:

On Mon, Feb 23, 2009 at 3:42 AM, Don <nospam nospam.com> wrote:
 Bill Baxter wrote:
 On Sun, Feb 22, 2009 at 11:12 PM, Denis Koroskin <2korden gmail.com>
 wrote:
 On Sun, 22 Feb 2009 16:50:51 +0300, Christopher Wright
 <dhasenan gmail.com>
 wrote:

 Brad Roberts wrote:
 Back in c and c++, with it's pre-processor, merging adjacent string
 literals is very handy.  In D, it's only marginally so, but not
 completely useless.  It can still be used to break a really long string
 literal into parts.  There's other string boundary tokens in D which
 might well provide viable alternatives.

 In C and C++, there is no way to catenate strings at compile time. The
 only way to catenate strings is with strcat. That places the additional
 burden on programmers that they have to include string.h. For that
 reason,
 it makes sense to catenate adjacent string literals.

 In D, there's a compile time catenation operator that doesn't require
 libraries. So the catenation by association saves you only one
 character.
 I'd say that's useless.

 I agree.

 I use this feature pretty frequently to break up long strings.
 I think I didn't use ~ for that because it makes me think an
 allocation might happen when it doesn't need to.

 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

 Yes, and because of CTFE, even complicated applications of ~ frequently
 don't involve any allocation. So your intuition was wrong! Implicit
 concatentation was probably one of the things which led to your false
 impression. So it may be bad in that respect, as well as bug-breeding.

Well, like I said, I vaguely recalled that DMD would eliminate the
alloc.  But is it in the spec?  Some other compiler might not
implement that optimization.  Or I might change from "foo"~"bar" to
"foo"~runTimeVar at some point and not notice that I'd introduced an
allocation because of that.  So the benefit of "foo" "bar" there was
that I could be absolutely sure, since it's in the spec, that it
concatenates the strings at compile time.

But I agree it's something that could be gotten rid of.

--bb

Feb 22 2009

BCS <none anon.com> writes:

Hello Bill,
 I use this feature pretty frequently to break up long strings. I think
 I didn't use ~ for that because it makes me think an allocation might
 happen when it doesn't need to.
 


yah, the WILL-happen-at-compiletime bit is nice

 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

same here.

 
 --bb

Feb 22 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Sun, Feb 22, 2009 at 12:51 PM, Bill Baxter <wbaxter gmail.com> wrote:
 I use this feature pretty frequently to break up long strings.
 I think I didn't use ~ for that because it makes me think an
 allocation might happen when it doesn't need to.

 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)

Of course, it does it as a matter of constant folding, just like 3 + 4.

Feb 22 2009

BCS <none anon.com> writes:

Hello Jarrett,

 On Sun, Feb 22, 2009 at 12:51 PM, Bill Baxter <wbaxter gmail.com>
 wrote:
 
 I use this feature pretty frequently to break up long strings. I
 think I didn't use ~ for that because it makes me think an allocation
 might happen when it doesn't need to.
 
 But after seeing the discussion here I'd be happy to switch to using
 "a"~"b" as long as it's guaranteed by the language that such strings
 will be concatenated at compile time.   (I think the is the case now,
 right?)
 

 Of course, it does it as a matter of constant folding, just like 3 +
 4.
 

IIRC DMD doesn't always do the constant folding (Decent has a post processed 
view that shows this in some cases) For instance, IIRC it only does left 
most so this:

char[] foo = "foo";
char[] bar = foo ~ "bar" ~ "baz"

doesn't get folded. And even if DMD were to start doing that one, there is 
no requirement that another compiler also do it.

Feb 22 2009

bearophile <bearophileHUGS lycos.com> writes:

BCS:

 IIRC DMD doesn't always do the constant folding (Decent has a post processed 
 view that shows this in some cases) For instance, IIRC it only does left 
 most so this:
 char[] foo = "foo";
 char[] bar = foo ~ "bar" ~ "baz"
 doesn't get folded. And even if DMD were to start doing that one, there is 
 no requirement that another compiler also do it.

If there are guarantees that "abc" "def" are folded at compile time, then the
same guarantees can be specified for "abc" ~ "def". I can't see a problem.

I have also compiled this code with DMD:

void main() {
    string foo = "foo";
    string bar = foo ~ "bar" ~ "baz";
}

Resulting asm, no optimizations:

L0:		push	EBP
		mov	EBP,ESP
		mov	EDX,FLAT:_DATA[0Ch]
		mov	EAX,FLAT:_DATA[08h]
		push	dword ptr FLAT:_DATA[01Ch]
		push	dword ptr FLAT:_DATA[018h]
		push	dword ptr FLAT:_DATA[02Ch]
		push	dword ptr FLAT:_DATA[028h]
		push	EDX
		push	EAX
		push	3
		mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
		push	ECX
		call	near ptr __d_arraycatnT
		xor	EAX,EAX
		add	ESP,020h
		pop	EBP
		ret


Resulting asm, with optimizations:

L0:		sub	ESP,0Ch
		mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
		push	dword ptr FLAT:_DATA[01Ch]
		push	dword ptr FLAT:_DATA[018h]
		push	dword ptr FLAT:_DATA[02Ch]
		push	dword ptr FLAT:_DATA[028h]
		push	dword ptr FLAT:_DATA[0Ch]
		push	dword ptr FLAT:_DATA[08h]
		push	3
		push	EAX
		call	near ptr __d_arraycatnT
		add	ESP,020h
		add	ESP,0Ch
		xor	EAX,EAX
		ret

I can see just one arraycatn, so the two string literals are folded at compile
time, I think.

Bye,
bearophile

Feb 22 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Mon, 23 Feb 2009 04:18:35 +0300, bearophile <bearophileHUGS lycos.com> wrote:

 BCS:

 IIRC DMD doesn't always do the constant folding (Decent has a post  
 processed
 view that shows this in some cases) For instance, IIRC it only does left
 most so this:
 char[] foo = "foo";
 char[] bar = foo ~ "bar" ~ "baz"
 doesn't get folded. And even if DMD were to start doing that one, there  
 is
 no requirement that another compiler also do it.

 If there are guarantees that "abc" "def" are folded at compile time,  
 then the same guarantees can be specified for "abc" ~ "def". I can't see  
 a problem.

 I have also compiled this code with DMD:

 void main() {
     string foo = "foo";
     string bar = foo ~ "bar" ~ "baz";
 }

Won't work. Imaging foo is a user-defined type with custom opCat:

auto bar = foo  ~  "123" ~  "456";

compare to:

      std::cout << "123" << "456";

Feb 22 2009

bearophile <bearophileHUGS lycos.com> writes:

Denis Koroskin:
 bearophile:
 void main() {
     string foo = "foo";
     string bar = foo ~ "bar" ~ "baz";
 }

 
 Won't work. Imaging foo is a user-defined type with custom opCat:
 auto bar = foo  ~  "123" ~  "456";
 compare to:
       std::cout << "123" << "456";

In this thread I was talking about the concat of true strings, not of generic
objects:
auto bar = foo  ~ ("123" ~  "456");

Are you saying that the concat operation of
"123" ~  "456"
has a different (invisible) "operator" precedence of:
"123" "456" ?
If this is true, then the ~ isn't a fully drop-in replacement for the automatic
concat of strings as done in C...

Bye,
bearophile

Feb 22 2009

Ellery Newcomer <ellery-newcomer utulsa.edu> writes:

bearophile wrote:
 Denis Koroskin:
 bearophile:
 void main() {
     string foo = "foo";
     string bar = foo ~ "bar" ~ "baz";
 }

 Won't work. Imaging foo is a user-defined type with custom opCat:
 auto bar = foo  ~  "123" ~  "456";
 compare to:
       std::cout << "123" << "456";

 
 In this thread I was talking about the concat of true strings, not of generic
objects:
 auto bar = foo  ~ ("123" ~  "456");
 
 Are you saying that the concat operation of
 "123" ~  "456"
 has a different (invisible) "operator" precedence of:
 "123" "456" ?
 If this is true, then the ~ isn't a fully drop-in replacement for the
automatic concat of strings as done in C...
 
 Bye,
 bearophile

"123" "456" has the higher precedence

Feb 22 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Sun, Feb 22, 2009 at 9:29 PM, bearophile <bearophileHUGS lycos.com> wrote:

 Are you saying that the concat operation of
 "123" ~  "456"
 has a different (invisible) "operator" precedence of:
 "123" "456" ?
 If this is true, then the ~ isn't a fully drop-in replacement for the
automatic concat of strings as done in C...

Currently that's the case.  But it's simply unspecified in the
language specification, and there's no reason why the compiler can't
turn:

a = foo ~ "bar" ~ "baz";

into:

a = foo ~ "barbaz";

FWIW the MiniD compiler does this already.  It's just that DMD
currently does concatenation constant folding in a simple manner that
makes this kind of folding "invisible" to it.

But when all the operands of the concatenations are strings - like
when building up strings to be mixed-in, or in static/const variable
initializers - then everything will obviously be folded at compile
time.

Feb 22 2009

BCS <none anon.com> writes:

Hello bearophile,

 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

While it is not part of the spec, I do see a problem. If it were added....


 
 I have also compiled this code with DMD:
 
 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:
 
 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

note 6 things

 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:
 
 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]


again 6 things


 push	3

I think that is a varargs call

 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.
 
 Bye,
 bearophile


I think that DMD does some optimization for a~b~c etc. so that there is only 
one call for any number of chained ~ (array cat n). In this case I think 
it is doing that.

Feb 22 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 23 Feb 2009 03:48:17 +0000 (UTC), BCS wrote:

 Hello bearophile,
 
 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

 
 While it is not part of the spec, I do see a problem. If it were added....
 
 
 I have also compiled this code with DMD:
 
 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:
 
 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

 
 note 6 things
 
 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:
 
 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]

 
 again 6 things
 
 push	3

 
 I think that is a varargs call
 
 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.
 
 Bye,
 bearophile

 
 I think that DMD does some optimization for a~b~c etc. so that there is only 
 one call for any number of chained ~ (array cat n). In this case I think 
 it is doing that.

Surely enough, if you look into the compiled .obj you won't find
"barbaz" there.  All sub-strings are separete, regardless of the
optimization options.

Feb 26 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 26 Feb 2009 20:59:34 +0300, Sergey Gromov <snake.scaly gmail.com> wrote:

 Mon, 23 Feb 2009 03:48:17 +0000 (UTC), BCS wrote:

 Hello bearophile,

 If there are guarantees that "abc" "def" are folded at compile time,
 then the same guarantees can be specified for "abc" ~ "def". I can't
 see a problem.

 While it is not part of the spec, I do see a problem. If it were  
 added....

 I have also compiled this code with DMD:

 void main() {
 string foo = "foo";
 string bar = foo ~ "bar" ~ "baz";
 }
 Resulting asm, no optimizations:

 L0:		push	EBP
 mov	EBP,ESP
 mov	EDX,FLAT:_DATA[0Ch]
 mov	EAX,FLAT:_DATA[08h]
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]

 note 6 things

 push	EDX
 push	EAX
 push	3
 mov	ECX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	ECX
 call	near ptr __d_arraycatnT
 xor	EAX,EAX
 add	ESP,020h
 pop	EBP
 ret
 Resulting asm, with optimizations:

 L0:		sub	ESP,0Ch
 mov	EAX,offset FLAT:_D11TypeInfo_Aa6__initZ
 push	dword ptr FLAT:_DATA[01Ch]
 push	dword ptr FLAT:_DATA[018h]
 push	dword ptr FLAT:_DATA[02Ch]
 push	dword ptr FLAT:_DATA[028h]
 push	dword ptr FLAT:_DATA[0Ch]
 push	dword ptr FLAT:_DATA[08h]

 again 6 things

 push	3

 I think that is a varargs call

 push	EAX
 call	near ptr __d_arraycatnT
 add	ESP,020h
 add	ESP,0Ch
 xor	EAX,EAX
 ret
 I can see just one arraycatn, so the two string literals are folded at
 compile time, I think.

 Bye,
 bearophile

 I think that DMD does some optimization for a~b~c etc. so that there is  
 only
 one call for any number of chained ~ (array cat n). In this case I think
 it is doing that.

 Surely enough, if you look into the compiled .obj you won't find
 "barbaz" there.  All sub-strings are separete, regardless of the
 optimization options.

Here is a test:

import std.stdio;

void main()
{
    string t1 = "bar1" ~ "baz1";
    string t2 = t1 ~ "bar2" ~ "baz2";
    string t3 = t1 ~ ("bar3" ~ "baz3");
    
    writefln(t1);
    writefln(t2);
    writefln(t3);
}

compiled test executable contains strings bar1baz1 and bar3baz3.

Forth to note that declaring t1, t2 and t3 as const (i.e. "const string t1"
etc) makes the concatenations entirely at compile-time.

Feb 26 2009

bearophile <bearophileHUGS lycos.com> writes:

Frank Benoit Wrote:

     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
 ...
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

In such situations I often let the language split my string for me, it reduces
noise:

auto keywords = "abstract alias align asm
                 assert auto body bool break byte case
                 cast catch cdouble cent cfloat char class
                 const continue creal dchar debug default
                 delegate delete deprecated do double else
                 enum export extern false final finally
                 float for foreach foreach_reverse function
                 goto idouble if ifloat import in inout
                 int interface invariant ireal is lazy long
                 mixin module new null out override package
                 pragma private private: protected protected:
                 public public: real return scope short
                 static struct super switch synchronized
                 template this throw true try typedef typeid
                 typeof ubyte ucent uint ulong union unittest
                 ushort version void volatile wchar while
                 with ~this".split();

You can also put one keyword for each line, or put them in better formatted
columns.

If the strings may have spaces too inside then, then I put each string in a
different line, and then split according to the lines with
std.string.splitlines() (or str.splitlines() in Python).

Implicit string literal concatenation is a bug-prone anti-feature that is a
relic of C language that doesn't have a nice string concatenation syntax. In D
(and Python, etc) it's bad.
Months ago I have suggested to remove it and turn adjacent string literals into
a syntax error (to "solve" the back-compatibility with ported C/C++ code).

Brad Roberts:

In D, it's only marginally so, but not completely useless.  It can still be
used to break a really long string literal into parts.<

In such situations you can put a ~ at the end of each part. Explicit is better
than implicit :-)

Bye,
bearophile

Feb 22 2009

Kagamin <spam here.lot> writes:

bearophile Wrote:

 Months ago I have suggested to remove it

Didn't find it in bugzilla. Reported as bug 2685.

Feb 24 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Sun, 22 Feb 2009 10:21:20 +0100, Frank Benoit wrote:

 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",
         "assert", "auto", "body", "bool", "break", "byte", "case",
         "cast", "catch", "cdouble", "cent", "cfloat", "char", "class",
         "const", "continue", "creal", "dchar", "debug", "default",
         "delegate", "delete", "deprecated", "do", "double", "else",
         "enum", "export", "extern", "false", "final", "finally",
         "float", "for", "foreach", "foreach_reverse", "function",
         "goto", "idouble", "if", "ifloat", "import", "in", "inout",
         "int", "interface", "invariant", "ireal", "is", "lazy", "long",
         "mixin", "module", "new", "null", "out", "override", "package",
         "pragma", "private", "private:", "protected", "protected:",
         "public", "public:", "real", "return", "scope", "short",
         "static", "struct", "super", "switch", "synchronized",
         "template", "this", "throw", "true", "try", "typedef", "typeid",
         "typeof", "ubyte", "ucent", "uint" "ulong", "union", "unittest",
         "ushort", "version", "void", "volatile", "wchar", "while",
         "with", "~this" ];
 
 There is a comma missing : "uint" "ulong"

I agree this feature is dangerous and useless in D.

Feb 22 2009

Miles <_______ _______.____> writes:

Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",

Out of curiosity, are you trying to create a D parser? Because
"private:", "protected:", "public:" and "~this" are not keywords.

Feb 25 2009

Frank Benoit <keinfarbton googlemail.com> writes:

Miles schrieb:
 Frank Benoit wrote:
 Find the bug:
     static string[] KEYWORDS = [ "abstract", "alias", "align", "asm",

 
 Out of curiosity, are you trying to create a D parser? Because
 "private:", "protected:", "public:" and "~this" are not keywords.

Thats just a snippets I got from a dwt user.
After pasting that into a Java source file, i got an error for the
missing comma, and I found that interesting.

Feb 25 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Is implicit string literal concatenation a good thing?