www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8660

           Summary: Unclear semantics of array literals of char type, vs
                    string literals
           Product: D
           Version: D1 & D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: clugdbug yahoo.com.au



Array literals of char type, have completely different semantics from string
literals. In module scope:

char[] x = ['a'];  // OK -- array literals can have an implicit .dup
char[] y = "b";    // illegal

A second difference is that string literals have a trailing \0. It's important
for compatibility with C, but is barely mentioned in the spec. The spec does
not state if the trailing \0 is still present after operations like
concatenation.

CTFE can use either, but it has to choose one. This leads to odd effects:

string foo(bool b) {
    string c = ['a'];
    string d = "a";
    if (b)
        return c ~ c;
    else
        return c ~ d;
}

char[] x = foo(true);   // ok
char[] y = foo(false);  // rejected!

This is really bizarre because at run time, there is no difference between
foo(true) and foo(false). They both return a slice of something allocated on
the heap. I think x = foo(true) should be rejected as well, it has an implicit
cast from immutable to mutable.

I think the best way to clean up this mess would be to convert char[] array
literals into string literals whenever possible. This would mean that string
literals may occasionally be of *mutable* type! This would means that whenever
they are assigned to a mutable variable, an implicit .dup gets added (just as
happens now with array literals). The trailing zero would not be duped.
ie:
A string literal of mutable type should behaves the way a char[] array literal
behaves now.
A char[] array literal of immutable type should behave the way a string literal
does now.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 14 2012
next sibling parent reply "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:

 04:28:17 PDT ---
 Array literals of char type, have completely different 
 semantics from string
 literals. In module scope:

 char[] x = ['a'];  // OK -- array literals can have an implicit 
 .dup
 char[] y = "b";    // illegal

 A second difference is that string literals have a trailing \0. 
 It's important
 for compatibility with C, but is barely mentioned in the spec. 
 The spec does
 not state if the trailing \0 is still present after operations 
 like
 concatenation.
I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars. So the assignment is legal (but kind of unsafe actually, if you ever leak x). On the other hand, you can't bind y to an array of immutable chars, as that would subvert the type system. This, on the other hand, is legal. char[] y = "b".dup; I do not know how to initialize a char[] on the stack though (Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I don't know of any workaround. I think a good solution would be to request the "m" prefix for literals, which would initialize them as "mutable": x = m"some mutable string";
 A second difference is that string literals have a trailing \0. 
 It's important
 for compatibility with C, but is barely mentioned in the spec. 
 The spec does
 not state if the trailing \0 is still present after operations 
 like
 concatenation.

 CTFE can use either, but it has to choose one. This leads to 
 odd effects:

 string foo(bool b) {
     string c = ['a'];
     string d = "a";
     if (b)
         return c ~ c;
     else
         return c ~ d;
 }

 char[] x = foo(true);   // ok
 char[] y = foo(false);  // rejected!

 This is really bizarre because at run time, there is no 
 difference between
 foo(true) and foo(false). They both return a slice of something 
 allocated on
 the heap. I think x = foo(true) should be rejected as well, it 
 has an implicit
 cast from immutable to mutable.
Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected!
 I think the best way to clean up this mess would be to convert 
 char[] array
 literals into string literals whenever possible. This would 
 mean that string
 literals may occasionally be of *mutable* type! This would 
 means that whenever
 they are assigned to a mutable variable, an implicit .dup gets 
 added (just as
 happens now with array literals). The trailing zero would not 
 be duped.
 ie:
 A string literal of mutable type should behaves the way a 
 char[] array literal
 behaves now.
 A char[] array literal of immutable type should behave the way 
 a string literal
 does now.
I think this would work with my "m" suggestion
Sep 14 2012
parent reply Don Clugston <dac nospam.com> writes:
On 14/09/12 14:50, monarch_dodra wrote:
 On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:

 PDT ---
 Array literals of char type, have completely different semantics from
 string
 literals. In module scope:

 char[] x = ['a'];  // OK -- array literals can have an implicit .dup
 char[] y = "b";    // illegal

 A second difference is that string literals have a trailing \0. It's
 important
 for compatibility with C, but is barely mentioned in the spec. The
 spec does
 not state if the trailing \0 is still present after operations like
 concatenation.
I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars.
I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820.
 So the
 assignment is legal (but kind of unsafe actually, if you ever leak x).
Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it.
 On the other hand, you can't bind y to an array of immutable chars, as
 that would subvert the type system.

 This, on the other hand, is legal.
 char[] y = "b".dup;

 I do not know how to initialize a char[] on the stack though (Appart
 from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I
 don't know of any workaround.

 I think a good solution would be to request the "m" prefix for literals,
 which would initialize them as "mutable":
 x = m"some mutable string";

 A second difference is that string literals have a trailing \0. It's
 important
 for compatibility with C, but is barely mentioned in the spec. The
 spec does
 not state if the trailing \0 is still present after operations like
 concatenation.

 CTFE can use either, but it has to choose one. This leads to odd effects:

 string foo(bool b) {
     string c = ['a'];
     string d = "a";
     if (b)
         return c ~ c;
     else
         return c ~ d;
 }

 char[] x = foo(true);   // ok
 char[] y = foo(false);  // rejected!

 This is really bizarre because at run time, there is no difference
 between
 foo(true) and foo(false). They both return a slice of something
 allocated on
 the heap. I think x = foo(true) should be rejected as well, it has an
 implicit
 cast from immutable to mutable.
Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected!
No it should not. The code example was correct. These are static variables.
 I think the best way to clean up this mess would be to convert char[]
 array
 literals into string literals whenever possible. This would mean that
 string
 literals may occasionally be of *mutable* type! This would means that
 whenever
 they are assigned to a mutable variable, an implicit .dup gets added
 (just as
 happens now with array literals). The trailing zero would not be duped.
 ie:
 A string literal of mutable type should behaves the way a char[] array
 literal
 behaves now.
 A char[] array literal of immutable type should behave the way a
 string literal
 does now.
I think this would work with my "m" suggestion
Not necessary. This is only a question about what happens with the compiler internals.
Sep 14 2012
parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 14 September 2012 at 15:00:29 UTC, Don Clugston wrote:
 On 14/09/12 14:50, monarch_dodra wrote:
 On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:

 04:28:17
 PDT ---
 Array literals of char type, have completely different 
 semantics from
 string
 literals. In module scope:

 char[] x = ['a'];  // OK -- array literals can have an 
 implicit .dup
 char[] y = "b";    // illegal

 A second difference is that string literals have a trailing 
 \0. It's
 important
 for compatibility with C, but is barely mentioned in the 
 spec. The
 spec does
 not state if the trailing \0 is still present after 
 operations like
 concatenation.
I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars.
I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820.
 So the
 assignment is legal (but kind of unsafe actually, if you ever 
 leak x).
Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it.
Thank you for taking the time to educate me. I still have a bit of trouble with static vs dynamic array initializations: Things don't work quite as in C++, which is confusing me. I'll need to study a bit harder how array initializations work. Good news is I'm learning. I think ALL my comments were wrong. In that case, you are right, since: char[] x = "a".dup; Is legal.
 Good point. For anybody reading though, the actual code 
 example should be
 enum char[] x = foo(true);   // ok
 enum char[] y = foo(false);  // rejected!
No it should not. The code example was correct. These are static variables.
I hadn't thought of static variables: I placed your code in a main, and both produced a compilation error. The enums reproduced the issue for me however.
 I think this would work with my "m" suggestion
Not necessary. This is only a question about what happens with the compiler internals.
Yes.
Sep 14 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8660


timon.gehr gmx.ch changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |timon.gehr gmx.ch



I don't have a deep understanding of the DMD CTFE engine, but wouldn't it
suffice to do a conversion to a string literal if the type is immutable(char)[]
and to an array literal otherwise? This would only have to be done once
(recursively on the entire return value) as a final sanitizing step after the
CTFE execution has run to completion. This would make both lines illegal, as
you suggest.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 14 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8660





 I don't have a deep understanding of the DMD CTFE engine, but wouldn't it
 suffice to do a conversion to a string literal if the type is immutable(char)[]
 and to an array literal otherwise? This would only have to be done once
 (recursively on the entire return value) as a final sanitizing step after the
 CTFE execution has run to completion. This would make both lines illegal, as
 you suggest.
Yes (in fact a sanitizing step already exists, that's where pointers are checked, for example). It wouldn't work for D1, though, which doesn't have immutable, and for which this compiles: char [] s = "abc"; char [] t = ['a','b','c']; (Yuck!) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 17 2012