www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - To dup or not to dup?

reply =?ISO-8859-1?Q?J=FCrgen_Herz?= <juergen jherz.redirectme.net> writes:
Hi,

I'm absolutely new to D and about to see if D is a language I could like.
On http://www.digitalmars.com/d/cppstrings.html I found

char[] s1 = "hello world";
char[] s2 = "goodbye      ".dup;
s2[8..13] = s1[6..11];		// s2 is "goodbye world"

The .dup is needed because string literals are read-only in D, the .dup
will create a copy that is writable.

The same with
char []a = "Txst";
a[1] = 'a';

I find that interesting because here it also works (compiles and runs)
without .dup (DMD 1.007). Has the language changed since writing that
web page?

Jürgen
Feb 25 2007
next sibling parent Johan Granberg <lijat.meREM OVE.gmail.com> writes:
Jürgen Herz wrote:

 Hi,
 
 I'm absolutely new to D and about to see if D is a language I could like.
 On http://www.digitalmars.com/d/cppstrings.html I found
 
 char[] s1 = "hello world";
 char[] s2 = "goodbye      ".dup;
 s2[8..13] = s1[6..11];                // s2 is "goodbye world"
 
 The .dup is needed because string literals are read-only in D, the .dup
 will create a copy that is writable.
 
 The same with
 char []a = "Txst";
 a[1] = 'a';
 
 I find that interesting because here it also works (compiles and runs)
 without .dup (DMD 1.007). Has the language changed since writing that
 web page?
 
 Jürgen
You should not write to string literals but if the compiler enforces it or not is a quality of implementation issue.
Feb 25 2007
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Jürgen Herz wrote:
 Hi,
 
 I'm absolutely new to D and about to see if D is a language I could like.
 On http://www.digitalmars.com/d/cppstrings.html I found
 
 char[] s1 = "hello world";
 char[] s2 = "goodbye      ".dup;
 s2[8..13] = s1[6..11];		// s2 is "goodbye world"
 
 The .dup is needed because string literals are read-only in D, the .dup
 will create a copy that is writable.
 
 The same with
 char []a = "Txst";
 a[1] = 'a';
 
 I find that interesting because here it also works (compiles and runs)
 without .dup (DMD 1.007). Has the language changed since writing that
 web page?
Let me guess, you're running Windows. (checks post header) Yep. Segfaults for me (running Linux). This isn't enforced on Windows. I'm not sure whose fault that is, but I assume Microsoft :P. Before you think I'm a MS basher, I'd like to mention that IIRC it's not enforced for C either on Windows: if you initialize char* with a string literal and try to modify it -- which is illegal in C for the very same reason -- it won't complain. Just because it doesn't crash on your computer doesn't mean it's legal or that it'll work on every computer (or even on your computer with a different compiler, for that matter -- though in this case it probably will if it's because of Windows).
Feb 25 2007
parent reply =?ISO-8859-1?Q?J=FCrgen_Herz?= <juergen jherz.redirectme.net> writes:
Frits van Bommel wrote:
 Jürgen Herz wrote:
 The same with
 char []a = "Txst";
 a[1] = 'a';
 
 I find that interesting because here it also works (compiles and runs)
 without .dup (DMD 1.007). Has the language changed since writing that
 web page?
Let me guess, you're running Windows. (checks post header) Yep. Segfaults for me (running Linux).
That's right. I've now also set up dmd on Linux and it segfaults here too.
 This isn't enforced on Windows. I'm not sure whose fault that is, but I 
 assume Microsoft :P.
 Before you think I'm a MS basher, I'd like to mention that IIRC it's not 
 enforced for C either on Windows: if you initialize char* with a string 
 literal and try to modify it -- which is illegal in C for the very same 
 reason -- it won't complain.
It also isn't enforced for C on Linux (gcc) though it also doesn't crash. Anyways, that's C and D (resp. dmd) could do better.
 Just because it doesn't crash on your computer doesn't mean it's legal 
 or that it'll work on every computer (or even on your computer with a 
 different compiler, for that matter -- though in this case it probably 
 will if it's because of Windows).
I understand that it is and why it is illegal. And crashing on Linux I'm relieved seeing the "right" consequence of doing illegal things. But I'm not convinced of the compiler. In my point of view a language and a compiler should catch as many programming errors at the earliest point possible, that is at compile time. What I started with was to find out if there's a const in D. Well, it is but ... To me it looks very inconsequent: char []a = "Test"; a[1] = 'x'; segfaults while const char []a = "Test"; a[1] = 'x'; is caught at compile time with "Error: string literals are immutable". Interestingly const char []a = "Test"; a[1..2] = "x"; compiles without warnings though still segfaults. Even worse: const char []a = "Test"; test(a); void test(char []s) { s[1] = 'x'; } Not only there doesn't seem no way to declare a function argument const, a non-const argument removes const without warnings - and promptly segfaults in test. On windows the result of that code is very interesting. Since it doesn't segfault, one can printf a and s after manipulation. And it's "Txst" in test() and "Test" outside. BTW const char []a = "Test".dup; gives consttest.d(3): Error: cannot evaluate _adDupT(&_D12TypeInfo_G4a6__initZ,"Test") at compile time. What does in want to tell me? Jürgen
Feb 26 2007
next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Jürgen Herz wrote:
 Frits van Bommel wrote:
 This isn't enforced on Windows. I'm not sure whose fault that is, but I 
 assume Microsoft :P.
 Before you think I'm a MS basher, I'd like to mention that IIRC it's not 
 enforced for C either on Windows: if you initialize char* with a string 
 literal and try to modify it -- which is illegal in C for the very same 
 reason -- it won't complain.
It also isn't enforced for C on Linux (gcc) though it also doesn't crash. Anyways, that's C and D (resp. dmd) could do better.
With my Linux gcc it *does* crash: --- urxae urxae:~/tmp$ cat test.c int main() { char* str = "test"; str[0] = 'b'; } urxae urxae:~/tmp$ gcc test.c -o test urxae urxae:~/tmp$ ./test Segmentation fault (core dumped) --- Though it isn't enforced by the compiler.
 Just because it doesn't crash on your computer doesn't mean it's legal 
 or that it'll work on every computer (or even on your computer with a 
 different compiler, for that matter -- though in this case it probably 
 will if it's because of Windows).
I understand that it is and why it is illegal. And crashing on Linux I'm relieved seeing the "right" consequence of doing illegal things.
Indeed.
 But I'm not convinced of the compiler. In my point of view a language
 and a compiler should catch as many programming errors at the earliest
 point possible, that is at compile time.
Yes, this is usually a good idea.
 What I started with was to find out if there's a const in D. Well, it is
 but ...
There are plans to add a better concept of non-modifiability to D, but last I heard Walter and Andrei are still working out the details of how it will work.
 To me it looks very inconsequent:
   char []a = "Test";
   a[1] = 'x';
 segfaults while
   const char []a = "Test";
   a[1] = 'x';
 is caught at compile time with "Error: string literals are immutable".
 Interestingly
   const char []a = "Test";
   a[1..2] = "x";
 compiles without warnings though still segfaults.
Current const support is indeed a bit weak.
 Even worse:
   const char []a = "Test";
   test(a);
 
   void test(char []s)
   {
     s[1] = 'x';
   }
 Not only there doesn't seem no way to declare a function argument const,
 a non-const argument removes const without warnings - and promptly
 segfaults in test.
This is one of the issues that should be addressed by the changes I mentioned above.
 On windows the result of that code is very interesting. Since it doesn't
 segfault, one can printf a and s after manipulation. And it's "Txst" in
 test() and "Test" outside.
Yes, it allows it to be modified, which is why it doesn't crash. I'm not sure why this is, but my guess is that either PE (the file format of Windows executables) doesn't allow you to specify "this part should be in read-only memory" or Windows doesn't follow those directions. I *do* know that ELF (the typical format of Linux executables) allows you to specify that and it is enforced when possible (i.e. when possible on the processor, which means "if everything on that memory page is read-only" on x86 processors)
 BTW
   const char []a = "Test".dup;
 gives
   consttest.d(3): Error: cannot evaluate
 _adDupT(&_D12TypeInfo_G4a6__initZ,"Test") at compile time.
 
 What does in want to tell me?
It wants to tell you that the result of '"Test".dup' cannot be evaluated at compile time. In some cases, 'const' really _means_ constant in D, the value needs to be available at compile time (or actually at link time in many cases).
Feb 26 2007
next sibling parent reply Lionello Lunesu <lio lunesu.remove.com> writes:
Frits van Bommel wrote:
 Jürgen Herz wrote:
 Frits van Bommel wrote:
 This isn't enforced on Windows. I'm not sure whose fault that is, but 
 I assume Microsoft :P.
 Before you think I'm a MS basher, I'd like to mention that IIRC it's 
 not enforced for C either on Windows: if you initialize char* with a 
 string literal and try to modify it -- which is illegal in C for the 
 very same reason -- it won't complain.
It also isn't enforced for C on Linux (gcc) though it also doesn't crash. Anyways, that's C and D (resp. dmd) could do better.
With my Linux gcc it *does* crash: --- urxae urxae:~/tmp$ cat test.c int main() { char* str = "test"; str[0] = 'b'; } urxae urxae:~/tmp$ gcc test.c -o test urxae urxae:~/tmp$ ./test Segmentation fault (core dumped) --- Though it isn't enforced by the compiler.
 Just because it doesn't crash on your computer doesn't mean it's 
 legal or that it'll work on every computer (or even on your computer 
 with a different compiler, for that matter -- though in this case it 
 probably will if it's because of Windows).
I understand that it is and why it is illegal. And crashing on Linux I'm relieved seeing the "right" consequence of doing illegal things.
Indeed.
 But I'm not convinced of the compiler. In my point of view a language
 and a compiler should catch as many programming errors at the earliest
 point possible, that is at compile time.
Yes, this is usually a good idea.
 What I started with was to find out if there's a const in D. Well, it is
 but ...
There are plans to add a better concept of non-modifiability to D, but last I heard Walter and Andrei are still working out the details of how it will work.
 To me it looks very inconsequent:
   char []a = "Test";
   a[1] = 'x';
 segfaults while
   const char []a = "Test";
   a[1] = 'x';
 is caught at compile time with "Error: string literals are immutable".
 Interestingly
   const char []a = "Test";
   a[1..2] = "x";
 compiles without warnings though still segfaults.
Current const support is indeed a bit weak.
 Even worse:
   const char []a = "Test";
   test(a);

   void test(char []s)
   {
     s[1] = 'x';
   }
 Not only there doesn't seem no way to declare a function argument const,
 a non-const argument removes const without warnings - and promptly
 segfaults in test.
This is one of the issues that should be addressed by the changes I mentioned above.
 On windows the result of that code is very interesting. Since it doesn't
 segfault, one can printf a and s after manipulation. And it's "Txst" in
 test() and "Test" outside.
Yes, it allows it to be modified, which is why it doesn't crash. I'm not sure why this is, but my guess is that either PE (the file format of Windows executables) doesn't allow you to specify "this part should be in read-only memory" or Windows doesn't follow those directions. I *do* know that ELF (the typical format of Linux executables) allows you to specify that and it is enforced when possible (i.e. when possible on the processor, which means "if everything on that memory page is read-only" on x86 processors)
 BTW
   const char []a = "Test".dup;
 gives
   consttest.d(3): Error: cannot evaluate
 _adDupT(&_D12TypeInfo_G4a6__initZ,"Test") at compile time.

 What does in want to tell me?
It wants to tell you that the result of '"Test".dup' cannot be evaluated at compile time. In some cases, 'const' really _means_ constant in D, the value needs to be available at compile time (or actually at link time in many cases).
Strange, from the DMD changelog: * The .dup property is now allowed for compile time function execution. Why is it complaining for that .dup? L.
Feb 26 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Lionello Lunesu wrote:
 Frits van Bommel wrote:
 Jürgen Herz wrote:
 BTW
   const char []a = "Test".dup;
 gives
   consttest.d(3): Error: cannot evaluate
 _adDupT(&_D12TypeInfo_G4a6__initZ,"Test") at compile time.

 What does in want to tell me?
It wants to tell you that the result of '"Test".dup' cannot be evaluated at compile time. In some cases, 'const' really _means_ constant in D, the value needs to be available at compile time (or actually at link time in many cases).
Strange, from the DMD changelog: * The .dup property is now allowed for compile time function execution. Why is it complaining for that .dup?
Because it's not constant enough to allow 'const', apparently. If you remove the 'const' it works just fine.
Feb 26 2007
prev sibling parent =?ISO-8859-1?Q?J=FCrgen_Herz?= <juergen jherz.redirectme.net> writes:
Frits van Bommel wrote:
 Jürgen Herz wrote:
 It also isn't enforced for C on Linux (gcc) though it also doesn't
 crash. Anyways, that's C and D (resp. dmd) could do better.
With my Linux gcc it *does* crash: --- urxae urxae:~/tmp$ cat test.c int main() { char* str = "test"; str[0] = 'b'; } urxae urxae:~/tmp$ gcc test.c -o test urxae urxae:~/tmp$ ./test Segmentation fault (core dumped) ---
Uh, sorry. I mixed up C and D and just wrote char a[] = "Test"; in the C test. That's different from []a on D of course.
 There are plans to add a better concept of non-modifiability to D,
Having a 1.0 I was under the impression such basics would have been done. But if it's not I'm glad to hear it's worked on.
 On windows the result of that code is very interesting. Since it doesn't
 segfault, one can printf a and s after manipulation. And it's "Txst" in
 test() and "Test" outside.
Yes, it allows it to be modified, which is why it doesn't crash.
Yes, but what I'm surprised is that s and a seem to point to different memory (seem because I didn't look at the actual addresses - have to make up for that when on Win again) since the string a was unchanged even after s was changed in test(). Jürgen
Feb 27 2007
prev sibling parent reply "Saaa" <empty needmail.com> writes:
 I understand that it is and why it is illegal. And crashing on Linux I'm
 relieved seeing the "right" consequence of doing illegal things.
I don't :) Could anybody pls explain it? What is wrong with changing a part of an array? or better, what is the difference between a string literal and an array of chars?
Feb 26 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Saaa wrote:
 I understand that it is and why it is illegal. And crashing on Linux I'm
 relieved seeing the "right" consequence of doing illegal things.
I don't :) Could anybody pls explain it? What is wrong with changing a part of an array? or better, what is the difference between a string literal and an array of chars?
A string literal directly references the memory it's stored in in the loaded executable. That same piece of memory can be used by every reference to that string literal. So something like: --- void main() { char[] test = "Hi"; test[1] = 'o'; writefln("%s", "Hi"); } --- may print "Ho" if the string is allowed to be modified. That's also dependent on how well the compiler optimizes this stuff, though.
Feb 26 2007