www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 8229] New: string literals are not zero-terminated during CTFE

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229

           Summary: string literals are not zero-terminated during CTFE
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: CTFE
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: timon.gehr gmx.ch


--- Comment #0 from timon.gehr gmx.ch 2012-06-11 15:56:58 PDT ---
DMD 2.059:

static assert(!(x){return *x;}("".ptr)); // error

The static assertion should pass.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 11 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug yahoo.com.au


--- Comment #1 from Don <clugdbug yahoo.com.au> 2012-06-12 09:48:41 PDT ---
This behaviour is intentional. Pointer operations are strictly checked in CTFE.
It's the same as doing 

int n = 0;
char c = ""[n];

which generates an array bounds error at runtime.

Is the terminating null character still in the spec? A long time ago it was in
there, but now I can only find two references to it in the current spec (in
'arrays' and in 'interfacing to C'), and they both relate to printf. 

The most detailed is in 'interface to C', which states:
"string literals, when they are not part of an initializer to a larger data
structure, have a '\0' character helpfully stored after the end of them."

which is pretty weird. These funky semantics would be difficult to implement in
CTFE, and I doubt they are desirable. Here's an example:

const(char)[] foo(char[] s) { return "abc" ~ s; }

immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE

bool baz()
{
    immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.

    return true;
}
static assert(baz());

---> bar is zero-terminated, bar2 is not, even though they had the same
assignment. When does this magical trailing zero get added?

I think you could reasonably interpret the spec as meaning that a trailing zero
is added to the end of string literals by the linker, not by the compiler. It's
only in CTFE that you can tell the difference.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 12 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #2 from timon.gehr gmx.ch 2012-06-12 10:55:45 PDT ---
(In reply to comment #1)
 This behaviour is intentional. Pointer operations are strictly checked in CTFE.
 It's the same as doing 
 
 int n = 0;
 char c = ""[n];
 
 which generates an array bounds error at runtime.
 

I think that would be stretching it too far. It is more like: auto s = ['\0']; auto q = s[0..0]; char c = *q.ptr; Which works fine at runtime and during CTFE.
 Is the terminating null character still in the spec? A long time ago it was in
 there, but now I can only find two references to it in the current spec (in
 'arrays' and in 'interfacing to C'), and they both relate to printf. 
 
 The most detailed is in 'interface to C', which states:
 "string literals, when they are not part of an initializer to a larger data
 structure, have a '\0' character helpfully stored after the end of them."
 
 which is pretty weird. These funky semantics would be difficult to implement in
 CTFE,

I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.
 and I doubt they are desirable. Here's an example:
 
 const(char)[] foo(char[] s) { return "abc" ~ s; }
 
 immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
 

Well, this is not specified afaics.
 bool baz()
 {
     immutable bar2 = foo("xyz"); // local variable, so isn't a string literal.
 
     return true;
 }
 static assert(baz());
 
 ---> bar is zero-terminated, bar2 is not, even though they had the same
 assignment. When does this magical trailing zero get added?
 

This is exactly the behavior that is observed at runtime. If it is undesirable, then that is a distinct issue that should be investigated. It would certainly be desirable to have consistent behavior at compile time and at runtime, but this is not a top-priority issue.
 I think you could reasonably interpret the spec as meaning that a trailing zero
 is added to the end of string literals by the linker, not by the compiler. It's
 only in CTFE that you can tell the difference.

In this case, the spec should definitely be fixed. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 12 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #3 from Don <clugdbug yahoo.com.au> 2012-06-13 01:44:42 PDT ---
(In reply to comment #2)
 (In reply to comment #1)
 This behaviour is intentional. Pointer operations are strictly checked in CTFE.
 It's the same as doing 
 
 int n = 0;
 char c = ""[n];
 
 which generates an array bounds error at runtime.
 

I think that would be stretching it too far. It is more like: auto s = ['\0']; auto q = s[0..0]; char c = *q.ptr;

That's an interesting interpretation. It can't be true for D1, where string literals are fixed length arrays, but it could work for D2. In D1 it's more like: struct S { static char[3] s = ['a', 'b', 'c']; static char terminator = '\0'; } And every mention of it in the spec dates from D1.
 Is the terminating null character still in the spec? A long time ago it was in
 there, but now I can only find two references to it in the current spec (in
 'arrays' and in 'interfacing to C'), and they both relate to printf. 
 
 The most detailed is in 'interface to C', which states:
 "string literals, when they are not part of an initializer to a larger data
 structure, have a '\0' character helpfully stored after the end of them."
 
 which is pretty weird. These funky semantics would be difficult to implement in
 CTFE,

I guess this is from D1 times, when string literals were static arrays, and doesn't apply anymore.

Could be. So the few parts of the spec that mention it are horribly out-of-date. Though it also applies to assigning to fixed length arrays. immutable(char)[3] s = "abc"; // Does this have a trailing zero?
 and I doubt they are desirable. Here's an example:
 
 const(char)[] foo(char[] s) { return "abc" ~ s; }
 
 immutable bar = foo("xyz"); // becomes a string literal when it leaves CTFE
 

Well, this is not specified afaics.

Hmm, maybe it isn't. The spec says almost nothing about the whole thing. What I do know is that there is a lot of existing code that relies on this behaviour (especially, "abc" ~ "def" having a trailing zero). Pretty much the only thing the spec says is that you can use string literals with printf. Does TDPL mention it? The spec definitely needs to be improved. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 13 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229


Martin Nowak <code dawg.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |code dawg.eu
           Severity|normal                      |major


--- Comment #4 from Martin Nowak <code dawg.eu> 2013-09-27 15:58:28 PDT ---
---
string bug(string a)
{
    char[] buf;
    buf.length = a.length;
    buf[0 .. a.length] = a[];
    return cast(string)buf[];
}

static const var = bug("foo");
---

I have a much bigger problem related to this.
String literals resulting from CTFE are missing the terminating zero in the
data segment. Whether or not the bug bites depends on the object layout and the
virtual memory mapping, so this is pretty annoying because it works too often.
The underlying issue is that var is emitted to the object file from
ArrayLiteralExp::toDt which doesn't perform the zero termination.
Not sure if and at which stage this should be converted to a StringLiteralExp.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=8229



--- Comment #5 from Martin Nowak <code dawg.eu> 2013-09-28 04:20:53 PDT ---
It is also a huge performance issue to use ArrayLiteralExp instead of
StringLiteralExp during object emission because the compiler creates a list of
1-byte elements. If for example you generate a 5kB string in CTFE this induces
a huge overhead.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 28 2013