www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - String literals

reply "Kalle A. Sandstrom" <ksandstr iki.fi> writes:
While thinking about D's differing use of the ``const'' keyword (which
is apparently more similar to C# than C or C++) I came to write a small
test program, which I've included below.

import std.stdio;

char[] get_string() {
	return "testing, testing";
}

int main(char[][] args)
{
	char[] z = get_string();
	writef("first: '%s'\n", z);
	z[3] = 'X';
	writef("second: '%s'\n", z);
	z = get_string();
	writef("third: '%s'\n", z);
	return 0;
}

Compiled with GDC 0.15 built on GCC 3.4.4, this produces code that
crashes after the first call to writef. The apparent reason is that
string literals are included in the text segment of ELF binaries and are
thus read-only. In C and C++ (AFAIK) this is made explicit to the
programmer by string literals being of type 'const char *', causing
rather significant warnings to be printed when compilation of code like
this is attempted.

However, being as D doesn't have a C-like concept of constness, this
doesn't so much as pop a warning. (Personally, I'd have expected some
sort of a clever copy-on-write semantic to be applied in the subscript
assignment; this would have been appropriately D-ish.) This behaviour
leads to the interesting (in the Chinese proverb sense) situation where
there are char arrays that can be modified and char arrays which cannot;
furthermore there is no way[1] to distinguish between the two!

I'm pretty sure that this qualifies as a language design bug, or
alternatively a compiler implementation bug if a COW semantic was
defined. In the former case, would it be too much to consider the
addition of a Java/C#-ish "string" type as part of the language?



[1] besides looking at the address of the first element of such an array
    and trying to figure out whether it falls in the text segment or not.
    This would be non-portable to say the least.

-- 
Kalle A. Sandstro"m                                        ksandstr iki.fi
DB9D 0C39:              F4FF 4535 B501 4C79 B1DF  03F6 27D1 BF12 DB9D 0C39
void *truth = &truth;                              http://iki.fi/ksandstr/
Aug 04 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Thu, 4 Aug 2005 23:28:20 +0300, Kalle A. Sandstrom wrote:

 While thinking about D's differing use of the ``const'' keyword (which
 is apparently more similar to C# than C or C++) I came to write a small
 test program, which I've included below.
 
 import std.stdio;
 
 char[] get_string() {
 	return "testing, testing";
 }
 
 int main(char[][] args)
 {
 	char[] z = get_string();
 	writef("first: '%s'\n", z);
 	z[3] = 'X';
 	writef("second: '%s'\n", z);
 	z = get_string();
 	writef("third: '%s'\n", z);
 	return 0;
 }
 

 
 [1] besides looking at the address of the first element of such an array
     and trying to figure out whether it falls in the text segment or not.
     This would be non-portable to say the least.

Yes. String literals are protected in Linux and unprotected in Windows. The same code above does not crash in Windows. -- Derek Parnell Melbourne, Australia 5/08/2005 7:18:43 AM
Aug 04 2005
prev sibling parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Kalle A. Sandstrom wrote:

 int main(char[][] args)
 {
 	char[] z = get_string();
 	writef("first: '%s'\n", z);
 	z[3] = 'X';

Since you don't "own" z here, you are supposed to .dup it first... (CoW)
 	writef("second: '%s'\n", z);
 	z = get_string();
 	writef("third: '%s'\n", z);
 	return 0;
 }

[...]
 I'm pretty sure that this qualifies as a language design bug, or
 alternatively a compiler implementation bug if a COW semantic was
 defined. In the former case, would it be too much to consider the
 addition of a Java/C#-ish "string" type as part of the language?

It's a language design "bug" if you want to call it that, and a source of much debate regarding adding such a "readonly" attribute back to D... Walter has said that he doesn't want a string type, preferring char[]. (or wchar[] or dchar[], but that's another discussion - regarding UTF) Meanwhile, as you said, there is no way to distinguish between the two. --anders
Aug 05 2005