digitalmars.D.bugs - [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals
- d-bugmail puremagic.com (50/50) Sep 14 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8660
- monarch_dodra (21/79) Sep 14 2012 I think this is the normal behavior actually. When you write
- Don Clugston (9/88) Sep 14 2012 I don't think you've looked at the compiler source code...
- monarch_dodra (14/60) Sep 14 2012 Thank you for taking the time to educate me. I still have a bit
- d-bugmail puremagic.com (15/15) Sep 14 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8660
- d-bugmail puremagic.com (12/18) Sep 17 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8660
http://d.puremagic.com/issues/show_bug.cgi?id=8660 Summary: Unclear semantics of array literals of char type, vs string literals Product: D Version: D1 & D2 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: clugdbug yahoo.com.au --- Comment #0 from Don <clugdbug yahoo.com.au> 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b"; // illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. CTFE can use either, but it has to choose one. This leads to odd effects: string foo(bool b) { string c = ['a']; string d = "a"; if (b) return c ~ c; else return c ~ d; } char[] x = foo(true); // ok char[] y = foo(false); // rejected! This is really bizarre because at run time, there is no difference between foo(true) and foo(false). They both return a slice of something allocated on the heap. I think x = foo(true) should be rejected as well, it has an implicit cast from immutable to mutable. I think the best way to clean up this mess would be to convert char[] array literals into string literals whenever possible. This would mean that string literals may occasionally be of *mutable* type! This would means that whenever they are assigned to a mutable variable, an implicit .dup gets added (just as happens now with array literals). The trailing zero would not be duped. ie: A string literal of mutable type should behaves the way a char[] array literal behaves now. A char[] array literal of immutable type should behave the way a string literal does now. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 14 2012
On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:--- Comment #0 from Don <clugdbug yahoo.com.au> 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b"; // illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation.I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars. So the assignment is legal (but kind of unsafe actually, if you ever leak x). On the other hand, you can't bind y to an array of immutable chars, as that would subvert the type system. This, on the other hand, is legal. char[] y = "b".dup; I do not know how to initialize a char[] on the stack though (Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I don't know of any workaround. I think a good solution would be to request the "m" prefix for literals, which would initialize them as "mutable": x = m"some mutable string";A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. CTFE can use either, but it has to choose one. This leads to odd effects: string foo(bool b) { string c = ['a']; string d = "a"; if (b) return c ~ c; else return c ~ d; } char[] x = foo(true); // ok char[] y = foo(false); // rejected! This is really bizarre because at run time, there is no difference between foo(true) and foo(false). They both return a slice of something allocated on the heap. I think x = foo(true) should be rejected as well, it has an implicit cast from immutable to mutable.Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected!I think the best way to clean up this mess would be to convert char[] array literals into string literals whenever possible. This would mean that string literals may occasionally be of *mutable* type! This would means that whenever they are assigned to a mutable variable, an implicit .dup gets added (just as happens now with array literals). The trailing zero would not be duped. ie: A string literal of mutable type should behaves the way a char[] array literal behaves now. A char[] array literal of immutable type should behave the way a string literal does now.I think this would work with my "m" suggestion
Sep 14 2012
On 14/09/12 14:50, monarch_dodra wrote:On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820.--- Comment #0 from Don <clugdbug yahoo.com.au> 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b"; // illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation.I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars.So the assignment is legal (but kind of unsafe actually, if you ever leak x).Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it.On the other hand, you can't bind y to an array of immutable chars, as that would subvert the type system. This, on the other hand, is legal. char[] y = "b".dup; I do not know how to initialize a char[] on the stack though (Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I don't know of any workaround. I think a good solution would be to request the "m" prefix for literals, which would initialize them as "mutable": x = m"some mutable string";No it should not. The code example was correct. These are static variables.A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation. CTFE can use either, but it has to choose one. This leads to odd effects: string foo(bool b) { string c = ['a']; string d = "a"; if (b) return c ~ c; else return c ~ d; } char[] x = foo(true); // ok char[] y = foo(false); // rejected! This is really bizarre because at run time, there is no difference between foo(true) and foo(false). They both return a slice of something allocated on the heap. I think x = foo(true) should be rejected as well, it has an implicit cast from immutable to mutable.Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected!Not necessary. This is only a question about what happens with the compiler internals.I think the best way to clean up this mess would be to convert char[] array literals into string literals whenever possible. This would mean that string literals may occasionally be of *mutable* type! This would means that whenever they are assigned to a mutable variable, an implicit .dup gets added (just as happens now with array literals). The trailing zero would not be duped. ie: A string literal of mutable type should behaves the way a char[] array literal behaves now. A char[] array literal of immutable type should behave the way a string literal does now.I think this would work with my "m" suggestion
Sep 14 2012
On Friday, 14 September 2012 at 15:00:29 UTC, Don Clugston wrote:On 14/09/12 14:50, monarch_dodra wrote:Thank you for taking the time to educate me. I still have a bit of trouble with static vs dynamic array initializations: Things don't work quite as in C++, which is confusing me. I'll need to study a bit harder how array initializations work. Good news is I'm learning. I think ALL my comments were wrong. In that case, you are right, since: char[] x = "a".dup; Is legal.On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:I don't think you've looked at the compiler source code... The dup is in e2ir.c:4820.--- Comment #0 from Don <clugdbug yahoo.com.au> 2012-09-14 04:28:17 PDT --- Array literals of char type, have completely different semantics from string literals. In module scope: char[] x = ['a']; // OK -- array literals can have an implicit .dup char[] y = "b"; // illegal A second difference is that string literals have a trailing \0. It's important for compatibility with C, but is barely mentioned in the spec. The spec does not state if the trailing \0 is still present after operations like concatenation.I think this is the normal behavior actually. When you write "char[] x = ['a'];", you are not actually "newing" (or "dup"-ing) any data. You are just letting x point to a stack allocated array of chars.So the assignment is legal (but kind of unsafe actually, if you ever leak x).Yes it's legal. In my view it is a design mistake in the language. The issue now is how to minimize the damage from it.I hadn't thought of static variables: I placed your code in a main, and both produced a compilation error. The enums reproduced the issue for me however.Good point. For anybody reading though, the actual code example should be enum char[] x = foo(true); // ok enum char[] y = foo(false); // rejected!No it should not. The code example was correct. These are static variables.Yes.I think this would work with my "m" suggestionNot necessary. This is only a question about what happens with the compiler internals.
Sep 14 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8660 timon.gehr gmx.ch changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |timon.gehr gmx.ch --- Comment #1 from timon.gehr gmx.ch 2012-09-14 08:42:57 PDT --- I don't have a deep understanding of the DMD CTFE engine, but wouldn't it suffice to do a conversion to a string literal if the type is immutable(char)[] and to an array literal otherwise? This would only have to be done once (recursively on the entire return value) as a final sanitizing step after the CTFE execution has run to completion. This would make both lines illegal, as you suggest. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 14 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8660 --- Comment #2 from Don <clugdbug yahoo.com.au> 2012-09-17 08:28:43 PDT --- (In reply to comment #1)I don't have a deep understanding of the DMD CTFE engine, but wouldn't it suffice to do a conversion to a string literal if the type is immutable(char)[] and to an array literal otherwise? This would only have to be done once (recursively on the entire return value) as a final sanitizing step after the CTFE execution has run to completion. This would make both lines illegal, as you suggest.Yes (in fact a sanitizing step already exists, that's where pointers are checked, for example). It wouldn't work for D1, though, which doesn't have immutable, and for which this compiles: char [] s = "abc"; char [] t = ['a','b','c']; (Yuck!) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 17 2012