digitalmars.D.bugs - [Issue 8229] New: string literals are not zero-terminated during CTFE
- d-bugmail puremagic.com (20/20) Jun 11 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8229
- d-bugmail puremagic.com (35/35) Jun 12 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8229
- d-bugmail puremagic.com (19/57) Jun 12 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8229
- d-bugmail puremagic.com (27/63) Jun 13 2012 http://d.puremagic.com/issues/show_bug.cgi?id=8229
- d-bugmail puremagic.com (27/27) Sep 27 2013 http://d.puremagic.com/issues/show_bug.cgi?id=8229
- d-bugmail puremagic.com (9/9) Sep 28 2013 http://d.puremagic.com/issues/show_bug.cgi?id=8229
http://d.puremagic.com/issues/show_bug.cgi?id=8229 Summary: string literals are not zero-terminated during CTFE Product: D Version: D2 Platform: All OS/Version: All Status: NEW Keywords: CTFE Severity: normal Priority: P2 Component: DMD AssignedTo: nobody puremagic.com ReportedBy: timon.gehr gmx.ch --- Comment #0 from timon.gehr gmx.ch 2012-06-11 15:56:58 PDT --- DMD 2.059: static assert(!(x){return *x;}("".ptr)); // error The static assertion should pass. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 11 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229 Don <clugdbug yahoo.com.au> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |clugdbug yahoo.com.au --- Comment #1 from Don <clugdbug yahoo.com.au> 2012-06-12 09:48:41 PDT --- This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing int n = 0; char c = ""[n]; which generates an array bounds error at runtime. Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf. The most detailed is in 'interface to C', which states: "string literals, when they are not part of an initializer to a larger data structure, have a '\0' character helpfully stored after the end of them." which is pretty weird. These funky semantics would be difficult to implement in CTFE, and I doubt they are desirable. Here's an example: const(char)[] foo(char[] s) { return "abc" ~ s; } immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE bool baz() { immutable bar2 = foo("xyz"); // local variable, so isn't a string literal. return true; } static assert(baz()); ---> bar is zero-terminated, bar2 is not, even though they had the same assignment. When does this magical trailing zero get added? I think you could reasonably interpret the spec as meaning that a trailing zero is added to the end of string literals by the linker, not by the compiler. It's only in CTFE that you can tell the difference. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 12 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229 --- Comment #2 from timon.gehr gmx.ch 2012-06-12 10:55:45 PDT --- (In reply to comment #1)This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing int n = 0; char c = ""[n]; which generates an array bounds error at runtime.I think that would be stretching it too far. It is more like: auto s = ['\0']; auto q = s[0..0]; char c = *q.ptr; Which works fine at runtime and during CTFE.Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf. The most detailed is in 'interface to C', which states: "string literals, when they are not part of an initializer to a larger data structure, have a '\0' character helpfully stored after the end of them." which is pretty weird. These funky semantics would be difficult to implement in CTFE,I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.and I doubt they are desirable. Here's an example: const(char)[] foo(char[] s) { return "abc" ~ s; } immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFEWell, this is not specified afaics.bool baz() { immutable bar2 = foo("xyz"); // local variable, so isn't a string literal. return true; } static assert(baz()); ---> bar is zero-terminated, bar2 is not, even though they had the same assignment. When does this magical trailing zero get added?This is exactly the behavior that is observed at runtime. If it is undesirable, then that is a distinct issue that should be investigated. It would certainly be desirable to have consistent behavior at compile time and at runtime, but this is not a top-priority issue.I think you could reasonably interpret the spec as meaning that a trailing zero is added to the end of string literals by the linker, not by the compiler. It's only in CTFE that you can tell the difference.In this case, the spec should definitely be fixed. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 12 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229 --- Comment #3 from Don <clugdbug yahoo.com.au> 2012-06-13 01:44:42 PDT --- (In reply to comment #2)(In reply to comment #1)That's an interesting interpretation. It can't be true for D1, where string literals are fixed length arrays, but it could work for D2. In D1 it's more like: struct S { static char[3] s = ['a', 'b', 'c']; static char terminator = '\0'; } And every mention of it in the spec dates from D1.This behaviour is intentional. Pointer operations are strictly checked in CTFE. It's the same as doing int n = 0; char c = ""[n]; which generates an array bounds error at runtime.I think that would be stretching it too far. It is more like: auto s = ['\0']; auto q = s[0..0]; char c = *q.ptr;Could be. So the few parts of the spec that mention it are horribly out-of-date. Though it also applies to assigning to fixed length arrays. immutable(char)[3] s = "abc"; // Does this have a trailing zero?Is the terminating null character still in the spec? A long time ago it was in there, but now I can only find two references to it in the current spec (in 'arrays' and in 'interfacing to C'), and they both relate to printf. The most detailed is in 'interface to C', which states: "string literals, when they are not part of an initializer to a larger data structure, have a '\0' character helpfully stored after the end of them." which is pretty weird. These funky semantics would be difficult to implement in CTFE,I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.Hmm, maybe it isn't. The spec says almost nothing about the whole thing. What I do know is that there is a lot of existing code that relies on this behaviour (especially, "abc" ~ "def" having a trailing zero). Pretty much the only thing the spec says is that you can use string literals with printf. Does TDPL mention it? The spec definitely needs to be improved. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------and I doubt they are desirable. Here's an example: const(char)[] foo(char[] s) { return "abc" ~ s; } immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFEWell, this is not specified afaics.
Jun 13 2012
http://d.puremagic.com/issues/show_bug.cgi?id=8229 Martin Nowak <code dawg.eu> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |code dawg.eu Severity|normal |major --- Comment #4 from Martin Nowak <code dawg.eu> 2013-09-27 15:58:28 PDT --- --- string bug(string a) { char[] buf; buf.length = a.length; buf[0 .. a.length] = a[]; return cast(string)buf[]; } static const var = bug("foo"); --- I have a much bigger problem related to this. String literals resulting from CTFE are missing the terminating zero in the data segment. Whether or not the bug bites depends on the object layout and the virtual memory mapping, so this is pretty annoying because it works too often. The underlying issue is that var is emitted to the object file from ArrayLiteralExp::toDt which doesn't perform the zero termination. Not sure if and at which stage this should be converted to a StringLiteralExp. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 27 2013
http://d.puremagic.com/issues/show_bug.cgi?id=8229 --- Comment #5 from Martin Nowak <code dawg.eu> 2013-09-28 04:20:53 PDT --- It is also a huge performance issue to use ArrayLiteralExp instead of StringLiteralExp during object emission because the compiler creates a list of 1-byte elements. If for example you generate a 5kB string in CTFE this induces a huge overhead. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 28 2013